[rrd-users] trying to understand the relationship between source data, what's in rrd and what gets plotted

Wed Jul 25 14:57:32 CEST 2007

Alex van den Bogaerdt wrote:
> On Wed, Jul 25, 2007 at 07:43:50AM -0400, Mark Seger wrote:
>
>   
>>> "Nearest minute boundary":  No, it does not.  You tell it where to
>>> do so, using "--step".
>>>   
>>>       
>> Maybe I misled you by the term 'minute boundary', but I meant that it 
>> appeared to align the data such that at least one sample fell on a 
>> minute boundary which is consistent with an earlier reply made by 'Simon 
>> Hobson' in which he said
>>     
>
> Change your "--step" to something different.
>
>   
>> yes, and that confirms what I think I also said.  rrd picks the times of 
>> the intervals for you using YOUR stepsize but ITS times, so if my 
>> samples are every 10 seconds starting 6 seconds after the minute, none 
>> of my data points will be recorded exactly as I supplied them.  I 
>>     
>
> That is because hh:mm:06, hh:mm:16, hh:mm:26 and so on are not a whole
> multiple of 10 seconds.
>
> You have "n*step+offset", not "n*step".  This is why normalization is
> needed.
>
>
>   
>> As I said above it sounds like if I conform my data to align to the time 
>> boudary conditions rrd requires it should work and if I don't conform it 
>> won't.
>>     
>
> No.  Your step size is wrong, not your input.  Change your step size
> to 1,2,3 or 6 seconds
>   
so if I understand what you're suggesting I should pick a start time and 
step size such that my data will align accordingly, right?  Since I have 
samples at 00:01:06, 01:12, etc that would mean I should pick a time 
that lands on a minute boundary and a step of 2 because 00:01:02, 1:04:, 
1:06, etc will still hit all my timestamps.  1 sec would work too but 
that would be overkill.  I don't think 3 or 6 would do it because they 
would not all align.  00:01:06 would, but you'd never see 01:16.

so let's say I have 3 samples of 100, 1000 and 100 starting at 
00:01:06.  since these are absolute numbers for 10 second intervals, 
they really represent rates of 10/sec, 100/sec and 10/sec.  am I then 
correct in assuming that rrd will then normalize it into 15 slots with 
20/slot for the first 5, 200 for the next 5 and then 20 for the next 5, 
all aligned to 00:01:00. so starting at 01:00 the data would look like 
20 20 20 20 20 200 200 200 200 200 200 20 20 20 20 20.  If I then wanted 
to see what the rate is at 01:06, rrd would see a value in that 2 second 
slot of 20 and treat it as a rate of 10/sec.  the same would hold for 
any of the 200s which would be reported as 100/sec for the slots they 
occur in, right?

this is certainly a lot closer to what I was looking for and gets back 
to really clarifying my original question which was the subject of this 
thread.  I guess the negatives here are you have to be real careful to 
pick the right time and stepsize and if your samples don't land on 
integral time boundaries all bets are off (what if my samples were at 
00:01:06.5, 00:01:12.5, etc?).  it would also make my rrd database 5 
times bigger and it's already over 10MB for 1 day's worth of data.

btw - just to toss in an interesting wrinkle did you know if you sample 
network statistics once a second you will periodically get an invalid 
value because of the frequency at which linux updates its network 
counters?  the only way I'm able to get accurate network statistics near 
that rate is to sample them every 0.9765 seconds.  I can go into more 
detail if anyone really cares.  8-)

-mark