# [rrd-users] trying to understand the relationship between source data, what's in rrd and what gets plotted

Mark Seger Mark.Seger at hp.com
Wed Jul 25 14:57:32 CEST 2007

```
Alex van den Bogaerdt wrote:
> On Wed, Jul 25, 2007 at 07:43:50AM -0400, Mark Seger wrote:
>
>
>>> "Nearest minute boundary":  No, it does not.  You tell it where to
>>> do so, using "--step".
>>>
>>>
>> Maybe I misled you by the term 'minute boundary', but I meant that it
>> appeared to align the data such that at least one sample fell on a
>> minute boundary which is consistent with an earlier reply made by 'Simon
>> Hobson' in which he said
>>
>
> Change your "--step" to something different.
>
>
>> yes, and that confirms what I think I also said.  rrd picks the times of
>> the intervals for you using YOUR stepsize but ITS times, so if my
>> samples are every 10 seconds starting 6 seconds after the minute, none
>> of my data points will be recorded exactly as I supplied them.  I
>>
>
> That is because hh:mm:06, hh:mm:16, hh:mm:26 and so on are not a whole
> multiple of 10 seconds.
>
> You have "n*step+offset", not "n*step".  This is why normalization is
> needed.
>
>
>
>> As I said above it sounds like if I conform my data to align to the time
>> boudary conditions rrd requires it should work and if I don't conform it
>> won't.
>>
>
> to 1,2,3 or 6 seconds
>
so if I understand what you're suggesting I should pick a start time and
step size such that my data will align accordingly, right?  Since I have
samples at 00:01:06, 01:12, etc that would mean I should pick a time
that lands on a minute boundary and a step of 2 because 00:01:02, 1:04:,
1:06, etc will still hit all my timestamps.  1 sec would work too but
that would be overkill.  I don't think 3 or 6 would do it because they
would not all align.  00:01:06 would, but you'd never see 01:16.

so let's say I have 3 samples of 100, 1000 and 100 starting at
00:01:06.  since these are absolute numbers for 10 second intervals,
they really represent rates of 10/sec, 100/sec and 10/sec.  am I then
correct in assuming that rrd will then normalize it into 15 slots with
20/slot for the first 5, 200 for the next 5 and then 20 for the next 5,
all aligned to 00:01:00. so starting at 01:00 the data would look like
20 20 20 20 20 200 200 200 200 200 200 20 20 20 20 20.  If I then wanted
to see what the rate is at 01:06, rrd would see a value in that 2 second
slot of 20 and treat it as a rate of 10/sec.  the same would hold for
any of the 200s which would be reported as 100/sec for the slots they
occur in, right?

this is certainly a lot closer to what I was looking for and gets back
to really clarifying my original question which was the subject of this
thread.  I guess the negatives here are you have to be real careful to
pick the right time and stepsize and if your samples don't land on
integral time boundaries all bets are off (what if my samples were at
00:01:06.5, 00:01:12.5, etc?).  it would also make my rrd database 5
times bigger and it's already over 10MB for 1 day's worth of data.

btw - just to toss in an interesting wrinkle did you know if you sample
network statistics once a second you will periodically get an invalid
value because of the frequency at which linux updates its network
counters?  the only way I'm able to get accurate network statistics near
that rate is to sample them every 0.9765 seconds.  I can go into more
detail if anyone really cares.  8-)

-mark

```