[rrd-users] More observations and questions on COUNTER

Philip Peake philip at vogon.net
Fri Oct 22 21:27:06 CEST 2010


A while ago, I asked a question about how to avoid the problem of seeing
a huge spike when something being monitored as a counter gets restarted
(the jump from whatever the last reading was to a lesser value is seen
as a huge number of counts, rolling over the counter to zero.

The fix I used was one suggested by Alex van den Bogaerdt, which was
essentially to insert a NaN to indicate that the counter is now in an
unknown state, followed by a zero, so that the next (real) value will be
represented correctly.

This worked for my tests, so I deployed the fix.

Now, I use a DB which actually holds one month 4 weeks) of data, with a
30 second sampling period.
I use this DB to display three graphs:

Last month
Last day
Last hour

I do this by just setting the start to the appropriate value from <now>.

Strangely, I have noticed that this fix doesn't always work.

What I see if I look back over the data is a sequence looking like this
(simplified, with thee data sources):

T1    1000    1004    997
T2    1010    1020    1003
T3    NaN     Nan     NaN
T4    NaN     NaN     NaN
T5    0        0        0  
T6    0        0        0
T7    0        0        0
T8   4E6      4E6      4E6
T9    15      12       10

No spike is displayed on the month or day graphs, but one is displayed
on the hour graph.

Two odd things (to me) - Why is rrd still recording a counter roll-over
value?
Why does the same data show a spike on one graph, but not on the other two?

I suppose the third question might be why isn't the roll-over recorded
with the first zero rather than the first non-zero?




More information about the rrd-users mailing list