[rrd-users] invalid unknowns?
Aragon Gouveia
aragon at phat.za.net
Mon Apr 2 12:50:09 CEST 2012
Hi,
I'm collecting data every 120 seconds and placing it into an RRD. My
heartbeats are set to 240 seconds. After a few hours, my AVERAGE RRA
starts returning unknowns for a time period it had previously returned
valid data. My other RRAs (MAX, LAST, etc.) return valid data for the
same time period.
I first noticed this as gaps in my graphs, and started monitoring them
regularly as a result, suspecting my data collection scripts were
returning no data during those time periods. Well, I've been monitoring
them and the gap I see right now for a 2 hour period 16 hours ago did
not exist until a few hours ago! It's as if the RRD file becomes
corrupt after a while, and old data that was previously valid starts
coming out as unknown.
Since noticing this I started logging my rrdupdate commands to a text
file to see what data is entering the RRD at what times. The data
entries are all valid, and never outside of a 120 second window by more
than a second.
Here's some rrdfetch output:
$ rrdtool fetch health.rrd AVERAGE -s -52000 -e -50000
1333311120: nan nan nan nan
1333311240: nan nan nan nan
1333311360: nan nan nan nan
1333311480: nan nan nan nan
1333311600: nan nan nan nan
1333311720: nan nan nan nan
1333311840: nan nan nan nan
1333311960: nan nan nan nan
1333312080: nan nan 3.0034230333e+00 6.0000000000e+00
1333312200: 1.3400000000e+01 2.7000000000e+01 3.9933606000e+00
6.0000000000e+00
1333312320: 1.3400000000e+01 2.7000000000e+01 4.0000000000e+00
6.0000000000e+00
1333312440: 1.3400000000e+01 2.7000000000e+01 3.0045041083e+00
6.0000000000e+00
1333312560: 1.3400000000e+01 2.7000000000e+01 3.9936468250e+00
6.0000000000e+00
1333312680: 1.3400000000e+01 2.7000000000e+01 4.9911438667e+00
6.0000000000e+00
1333312800: 1.3400000000e+01 2.7000000000e+01 5.0000000000e+00
6.0000000000e+00
1333312920: 1.3400000000e+01 2.7000000000e+01 3.0124310667e+00
6.0000000000e+00
1333313040: 1.3400000000e+01 2.7000000000e+01 3.0000000000e+00
6.0000000000e+00
1333313160: 1.3400000000e+01 2.7000000000e+01 3.9892649250e+00
6.0000000000e+00
$ rrdtool fetch health.rrd LAST -s -52000 -e -50000
1333311120: 1.3400000000e+01 2.7000000000e+01 3.0101470417e+00
6.0000000000e+00
1333311240: 1.3400000000e+01 2.7000000000e+01 3.9927562500e+00
6.0000000000e+00
1333311360: 1.3400000000e+01 2.7000000000e+01 4.0000000000e+00
6.0000000000e+00
1333311480: 1.3400000000e+01 2.7000000000e+01 4.0000000000e+00
6.0000000000e+00
1333311600: 1.3400000000e+01 2.7000000000e+01 3.0112068833e+00
6.0000000000e+00
1333311720: 1.3400000000e+01 2.7000000000e+01 4.9888617833e+00
6.0000000000e+00
1333311840: 1.3499455991e+01 2.7000000000e+01 4.0054400917e+00
6.0000000000e+00
1333311960: 1.3400839001e+01 2.7000000000e+01 4.0000000000e+00
6.0000000000e+00
1333312080: 1.3400000000e+01 2.7000000000e+01 3.0034230333e+00
6.0000000000e+00
1333312200: 1.3400000000e+01 2.7000000000e+01 3.9933606000e+00
6.0000000000e+00
1333312320: 1.3400000000e+01 2.7000000000e+01 4.0000000000e+00
6.0000000000e+00
1333312440: 1.3400000000e+01 2.7000000000e+01 3.0045041083e+00
6.0000000000e+00
1333312560: 1.3400000000e+01 2.7000000000e+01 3.9936468250e+00
6.0000000000e+00
1333312680: 1.3400000000e+01 2.7000000000e+01 4.9911438667e+00
6.0000000000e+00
1333312800: 1.3400000000e+01 2.7000000000e+01 5.0000000000e+00
6.0000000000e+00
1333312920: 1.3400000000e+01 2.7000000000e+01 3.0124310667e+00
6.0000000000e+00
1333313040: 1.3400000000e+01 2.7000000000e+01 3.0000000000e+00
6.0000000000e+00
1333313160: 1.3400000000e+01 2.7000000000e+01 3.9892649250e+00
6.0000000000e+00
And here's some of my rrdupdate log:
1333311120 rrdtool update health.rrd -t voltage:temperature:cpu:memory
N:13.4:27.0:3:6
1333311241 rrdtool update health.rrd -t voltage:temperature:cpu:memory
N:13.4:27.0:4:6
1333311360 rrdtool update health.rrd -t voltage:temperature:cpu:memory
N:13.4:27.0:4:6
1333311481 rrdtool update health.rrd -t voltage:temperature:cpu:memory
N:13.4:27.0:4:6
1333311600 rrdtool update health.rrd -t voltage:temperature:cpu:memory
N:13.4:27.0:3:6
1333311720 rrdtool update health.rrd -t voltage:temperature:cpu:memory
N:13.4:27.0:5:6
1333311841 rrdtool update health.rrd -t voltage:temperature:cpu:memory
N:13.5:27.0:4:6
1333311960 rrdtool update health.rrd -t voltage:temperature:cpu:memory
N:13.4:27.0:4:6
I'm no stranger to RRDtool, but this has stumped me. Any ideas?
rrdtool 1.2.30
FreeBSD 8.2-RELEASE amd64
Thanks,
Aragon
More information about the rrd-users
mailing list