[rrd-users] More observations and questions on COUNTER
linux at thehobsons.co.uk
Sat Oct 23 10:06:44 CEST 2010
Philip Peake wrote:
>The fix I used was one suggested by Alex van den Bogaerdt, which was
>essentially to insert a NaN to indicate that the counter is now in an
>unknown state, followed by a zero, so that the next (real) value will be
>This worked for my tests, so I deployed the fix.
>Now, I use a DB which actually holds one month 4 weeks) of data, with a
>30 second sampling period.
>I use this DB to display three graphs:
>I do this by just setting the start to the appropriate value from <now>.
>Strangely, I have noticed that this fix doesn't always work.
>What I see if I look back over the data is a sequence looking like this
>(simplified, with thee data sources):
>T1 1000 1004 997
>T2 1010 1020 1003
>T3 NaN Nan NaN
>T4 NaN NaN NaN
>T5 0 0 0
>T6 0 0 0
>T7 0 0 0
>T8 4E6 4E6 4E6
>T9 15 12 10
>No spike is displayed on the month or day graphs, but one is displayed
>on the hour graph.
>Two odd things (to me) - Why is rrd still recording a counter roll-over
>Why does the same data show a spike on one graph, but not on the other two?
>I suppose the third question might be why isn't the roll-over recorded
>with the first zero rather than the first non-zero?
I suspect all three questions may be related. There is a distinct but
small time period where your updates may get out of sync. If an
update occurs between you writing NaN and zero, then your zero won't
work and the previous count doesn't get properly reset. In fact,
depending on the timing, it's entirely possible an update is missing
because it failed due to "time standing still" (ie two updates with
the same timestamp).
In fact, if you are updating every 30 seconds, there is a 1 in 15
chance of a clash. Your reset script will take two seconds of time in
the rrd file to do it's work (ie update to NaN at time t, update to 0
at time t+1second). Thus two seconds of time are not available in a
30 second window) for your script to update the file.
I'd be inclined to add some logging statement to your scripts to log
the actual update statements they are using to a text file - that
way, when you next see the problem occur, your can refer to the text
file and see what actual updates were done - and replay them into a
fresh file a step at a time while monitoring the result.
Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
author Gladys Hobson. Novels - poetry - short stories - ideal as
Christmas stocking fillers. Some available as e-books.
More information about the rrd-users