[rrd-users] trying to understand the relationship between source data, what's in rrd and what gets plotted

Wed Jul 25 15:19:22 CEST 2007

Mark Seger wrote:

>  > That is because hh:mm:06, hh:mm:16, hh:mm:26 and so on are not a whole
>>  multiple of 10 seconds.
>>
>>  You have "n*step+offset", not "n*step".  This is why normalization is
>>  needed.
>>
>>
>>  
>>>  As I said above it sounds like if I conform my data to align to the time
>>>  boudary conditions rrd requires it should work and if I don't conform it
>>>  won't.
>>>    
>>
>>  No.  Your step size is wrong, not your input.  Change your step size
>>  to 1,2,3 or 6 seconds
>>  
>so if I understand what you're suggesting I should pick a start time and
>step size such that my data will align accordingly, right?  Since I have
>samples at 00:01:06, 01:12, etc that would mean I should pick a time
>that lands on a minute boundary and a step of 2 because 00:01:02, 1:04:,
>1:06, etc will still hit all my timestamps.  1 sec would work too but
>that would be overkill.  I don't think 3 or 6 would do it because they
>would not all align.  00:01:06 would, but you'd never see 01:16.

Not quite - FORGET THE MINUTE BOUNDARIES

rrdtool uses samples that are a multiple os "step" seconds since unix 
epoch - you can easily pick step times which do not fall on minute 
boundaries (whilst 7 would not be very common, most times it would 
not fall on a minute boundary).

But you are correct that steps of 3 or 6 will not get you 10 second intervals.

>so let's say I have 3 samples of 100, 1000 and 100 starting at
>00:01:06.  since these are absolute numbers for 10 second intervals,
>they really represent rates of 10/sec, 100/sec and 10/sec.  am I then
>correct in assuming that rrd will then normalize it into 15 slots with
>20/slot for the first 5, 200 for the next 5 and then 20 for the next 5,
>all aligned to 00:01:00.

Actually 10/s is 10/s - not 20/s !  10/s * 2s would get you 20.

>  so starting at 01:00 the data would look like
>20 20 20 20 20 200 200 200 200 200 200 20 20 20 20 20.  If I then wanted
>to see what the rate is at 01:06, rrd would see a value in that 2 second
>slot of 20 and treat it as a rate of 10/sec.  the same would hold for
>any of the 200s which would be reported as 100/sec for the slots they
>occur in, right?
>
>this is certainly a lot closer to what I was looking for and gets back
>to really clarifying my original question which was the subject of this
>thread.  I guess the negatives here are you have to be real careful to
>pick the right time and stepsize and if your samples don't land on
>integral time boundaries all bets are off (what if my samples were at
>00:01:06.5, 00:01:12.5, etc?).  it would also make my rrd database 5
>times bigger and it's already over 10MB for 1 day's worth of data.

Al alternative for handling your historical data might be to simply 
'lie' about the timestamps ! Eg, for your 00:01:06 sample, insert it 
with a timestamp of 00:01:00, 00:01:16 as 00:01:10 and so on. You'll 
have a slight blip as you change to actually collecting the data on 
10s steps (instead of n*10+6 steps) but it would allow you to graph 
your historical data without going to 2s steps.

>btw - just to toss in an interesting wrinkle did you know if you sample
>network statistics once a second you will periodically get an invalid
>value because of the frequency at which linux updates its network
>counters?  the only way I'm able to get accurate network statistics near
>that rate is to sample them every 0.9765 seconds.  I can go into more
>detail if anyone really cares.  8-)

I'm curious ...