[rrd-users] invalid unknowns?

Aragon Gouveia aragon at phat.za.net
Mon Apr 2 12:50:09 CEST 2012


Hi,

I'm collecting data every 120 seconds and placing it into an RRD.  My 
heartbeats are set to 240 seconds.  After a few hours, my AVERAGE RRA 
starts returning unknowns for a time period it had previously returned 
valid data.  My other RRAs (MAX, LAST, etc.) return valid data for the 
same time period.

I first noticed this as gaps in my graphs, and started monitoring them 
regularly as a result, suspecting my data collection scripts were 
returning no data during those time periods.  Well, I've been monitoring 
them and the gap I see right now for a 2 hour period 16 hours ago did 
not exist until a few hours ago!  It's as if the RRD file becomes 
corrupt after a while, and old data that was previously valid starts 
coming out as unknown.

Since noticing this I started logging my rrdupdate commands to a text 
file to see what data is entering the RRD at what times.  The data 
entries are all valid, and never outside of a 120 second window by more 
than a second.

Here's some rrdfetch output:

$ rrdtool fetch health.rrd AVERAGE -s -52000 -e -50000
1333311120: nan nan nan nan
1333311240: nan nan nan nan
1333311360: nan nan nan nan
1333311480: nan nan nan nan
1333311600: nan nan nan nan
1333311720: nan nan nan nan
1333311840: nan nan nan nan
1333311960: nan nan nan nan
1333312080: nan nan 3.0034230333e+00 6.0000000000e+00
1333312200: 1.3400000000e+01 2.7000000000e+01 3.9933606000e+00 
6.0000000000e+00
1333312320: 1.3400000000e+01 2.7000000000e+01 4.0000000000e+00 
6.0000000000e+00
1333312440: 1.3400000000e+01 2.7000000000e+01 3.0045041083e+00 
6.0000000000e+00
1333312560: 1.3400000000e+01 2.7000000000e+01 3.9936468250e+00 
6.0000000000e+00
1333312680: 1.3400000000e+01 2.7000000000e+01 4.9911438667e+00 
6.0000000000e+00
1333312800: 1.3400000000e+01 2.7000000000e+01 5.0000000000e+00 
6.0000000000e+00
1333312920: 1.3400000000e+01 2.7000000000e+01 3.0124310667e+00 
6.0000000000e+00
1333313040: 1.3400000000e+01 2.7000000000e+01 3.0000000000e+00 
6.0000000000e+00
1333313160: 1.3400000000e+01 2.7000000000e+01 3.9892649250e+00 
6.0000000000e+00

$ rrdtool fetch health.rrd LAST -s -52000 -e -50000
1333311120: 1.3400000000e+01 2.7000000000e+01 3.0101470417e+00 
6.0000000000e+00
1333311240: 1.3400000000e+01 2.7000000000e+01 3.9927562500e+00 
6.0000000000e+00
1333311360: 1.3400000000e+01 2.7000000000e+01 4.0000000000e+00 
6.0000000000e+00
1333311480: 1.3400000000e+01 2.7000000000e+01 4.0000000000e+00 
6.0000000000e+00
1333311600: 1.3400000000e+01 2.7000000000e+01 3.0112068833e+00 
6.0000000000e+00
1333311720: 1.3400000000e+01 2.7000000000e+01 4.9888617833e+00 
6.0000000000e+00
1333311840: 1.3499455991e+01 2.7000000000e+01 4.0054400917e+00 
6.0000000000e+00
1333311960: 1.3400839001e+01 2.7000000000e+01 4.0000000000e+00 
6.0000000000e+00
1333312080: 1.3400000000e+01 2.7000000000e+01 3.0034230333e+00 
6.0000000000e+00
1333312200: 1.3400000000e+01 2.7000000000e+01 3.9933606000e+00 
6.0000000000e+00
1333312320: 1.3400000000e+01 2.7000000000e+01 4.0000000000e+00 
6.0000000000e+00
1333312440: 1.3400000000e+01 2.7000000000e+01 3.0045041083e+00 
6.0000000000e+00
1333312560: 1.3400000000e+01 2.7000000000e+01 3.9936468250e+00 
6.0000000000e+00
1333312680: 1.3400000000e+01 2.7000000000e+01 4.9911438667e+00 
6.0000000000e+00
1333312800: 1.3400000000e+01 2.7000000000e+01 5.0000000000e+00 
6.0000000000e+00
1333312920: 1.3400000000e+01 2.7000000000e+01 3.0124310667e+00 
6.0000000000e+00
1333313040: 1.3400000000e+01 2.7000000000e+01 3.0000000000e+00 
6.0000000000e+00
1333313160: 1.3400000000e+01 2.7000000000e+01 3.9892649250e+00 
6.0000000000e+00

And here's some of my rrdupdate log:

1333311120 rrdtool update health.rrd -t voltage:temperature:cpu:memory 
N:13.4:27.0:3:6
1333311241 rrdtool update health.rrd -t voltage:temperature:cpu:memory 
N:13.4:27.0:4:6
1333311360 rrdtool update health.rrd -t voltage:temperature:cpu:memory 
N:13.4:27.0:4:6
1333311481 rrdtool update health.rrd -t voltage:temperature:cpu:memory 
N:13.4:27.0:4:6
1333311600 rrdtool update health.rrd -t voltage:temperature:cpu:memory 
N:13.4:27.0:3:6
1333311720 rrdtool update health.rrd -t voltage:temperature:cpu:memory 
N:13.4:27.0:5:6
1333311841 rrdtool update health.rrd -t voltage:temperature:cpu:memory 
N:13.5:27.0:4:6
1333311960 rrdtool update health.rrd -t voltage:temperature:cpu:memory 
N:13.4:27.0:4:6

I'm no stranger to RRDtool, but this has stumped me.  Any ideas?

rrdtool 1.2.30
FreeBSD 8.2-RELEASE amd64


Thanks,
Aragon



More information about the rrd-users mailing list