[rrd-users] rrdupdate corruption on Mac Snow Leopard

Thu May 17 20:45:42 CEST 2012

I've been using RRDTool for years on a linux box (redhat). I collect various
house sensor data. However, that system just went belly up, so I moved the
collection applications to a mac mini.  I used a bundled version of rrdtool
1.4.5 that had various other libraries collected to make the mac build work
statically linked together.  With minor build problems, I have rrdtool
working.

I have two C apps that receive transmissions from my (temperature) sensors
in the house and log to a GAUGE rrd.  The databases are on a linux server
that the mac accesses via NFS.  This was the same networked file system
setup when I used the linux box that died.

Frequently, the two apps (using different radio receivers) will receive the
same sensor "report" and (nearly) simultaneously attempt to rrdupdate (via C
call) the value for that time.  One of them will typically get either a
"could not lock" or "illegal attempt to update using time" rrd error.  Which
is fine in my case.  I just ignore those errors, do a rrd_clear_error and go
on.  I use the two apps (each with its own receiver) due to the range of the
sensor transmitters.  With only one receiver, I don't always get all the
sensor reports.  So I can't just use one receiver/app long term.

However, on my old linux box, data would get stored just fine.  On the mac
mini, I see sporadic cases where the data that is stored is not the data
that is retrieved.  For example, temperatures might be in the 65 degree
range and a value in the 15 degree range is present.  When I print out the
actual rrdupdate strings that I use during running of the apps, I only see
the normal expected temperatures being rrdupdated.  But when I do an
rrdfetch or display a graph, I find these "out of expected range" values. 
Several values a day, but usually hours apart for a given sensor.  I log
different sensors to different rrds and see these errors on different rrds. 
I get maybe a dozen such errors across all the sensors during a day.

If I only let one of the two logging apps run, I don't see these "glitches".

So I suspect some sort of locking problem is happening.

I am using the same app code, same sensors/receivers, same rrd data bases. 
I didn't re-create the rrds, just used the ones that I have been using. But
the platform is different (mac os snow leopard vs. redhat), different
rrdtool versions 1.4.5 vs. some older version.  I don't know what older
rrdtool version due to the redhat system's disk failure.  That system had
been running for years with no rrdtool problems and I don't recall what the
rrdtool version was that I used.  But I believe it was a binary install via
yum.

I looked at the source and don't see any obvious errors in the fcntl call
that does the rrd_lock.
I tried (on a whim) changing the fcntl to instead do a flock(), but no
change.
I verified that I am using the rrd_lock() code clause that isn't the one
that uses _locking().
I also verified that I am not using the _rrd_update() code clause(s) that
depend on HAVE_MMAP.

Any ideas of what to look for or try to eliminate this problem?  This makes
it really hard to get useful/reliable data that I have come to depend upon!

Thanks!

--
View this message in context: http://rrd-mailinglists.937164.n2.nabble.com/rrdupdate-corruption-on-Mac-Snow-Leopard-tp7564325.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.