[rrd-users] More observations and questions on COUNTER

Simon Hobson linux at thehobsons.co.uk
Sat Oct 23 10:06:44 CEST 2010


Philip Peake wrote:

>The fix I used was one suggested by Alex van den Bogaerdt, which was
>essentially to insert a NaN to indicate that the counter is now in an
>unknown state, followed by a zero, so that the next (real) value will be
>represented correctly.
>
>This worked for my tests, so I deployed the fix.
>
>Now, I use a DB which actually holds one month 4 weeks) of data, with a
>30 second sampling period.
>I use this DB to display three graphs:
>
>Last month
>Last day
>Last hour
>
>I do this by just setting the start to the appropriate value from <now>.
>
>Strangely, I have noticed that this fix doesn't always work.
>
>What I see if I look back over the data is a sequence looking like this
>(simplified, with thee data sources):
>
>T1    1000    1004    997
>T2    1010    1020    1003
>T3    NaN     Nan     NaN
>T4    NaN     NaN     NaN
>T5    0        0        0 
>T6    0        0        0
>T7    0        0        0
>T8   4E6      4E6      4E6
>T9    15      12       10
>
>No spike is displayed on the month or day graphs, but one is displayed
>on the hour graph.
>
>Two odd things (to me) - Why is rrd still recording a counter roll-over
>value?
>Why does the same data show a spike on one graph, but not on the other two?
>
>I suppose the third question might be why isn't the roll-over recorded
>with the first zero rather than the first non-zero?

I suspect all three questions may be related. There is a distinct but 
small time period where your updates may get out of sync. If an 
update occurs between you writing NaN and zero, then your zero won't 
work and the previous count doesn't get properly reset. In fact, 
depending on the timing, it's entirely possible an update is missing 
because it failed due to "time standing still" (ie two updates with 
the same timestamp).

In fact, if you are updating every 30 seconds, there is a 1 in 15 
chance of a clash. Your reset script will take two seconds of time in 
the rrd file to do it's work (ie update to NaN at time t, update to 0 
at time t+1second). Thus two seconds of time are not available in a 
30 second window) for your script to update the file.

I'd be inclined to add some logging statement to your scripts to log 
the actual update statements they are using to a text file - that 
way, when you next see the problem occur, your can refer to the text 
file and see what actual updates were done - and replay them into a 
fresh file a step at a time while monitoring the result.

-- 
Simon Hobson

Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
author Gladys Hobson. Novels - poetry - short stories - ideal as
Christmas stocking fillers. Some available as e-books.



More information about the rrd-users mailing list