[rrd-users] How to find first valid dp in rrd - repost - where arethe experts?

Alex van den Bogaerdt alex at vandenbogaerdt.nl
Wed Nov 19 00:17:39 CET 2008


----- Original Message ----- 
From: "Karl Fischer" <rrd-users at ficos.de>
To: "Alex van den Bogaerdt" <alex at vandenbogaerdt.nl>; 
<rrd-users at lists.oetiker.ch>
Sent: Tuesday, November 18, 2008 9:02 PM
Subject: Re: [rrd-users] How to find first valid dp in rrd - repost - where 
arethe experts?


> Alex van den Bogaerdt wrote:
>>> yes, I've tried that, but no matter which way I'm doing it, it 
>>> eventually
>>> ends up having to read the entire database ...
>>
>> [snip]
>>
>> What is the exact problem you are trying to solve, and do you really want 
>> to
>> solve it using rrdtool?
>
> Well, I want to know the first and last 'useful' datapoint in the database 
> for
> several reasons. Useful for me means: at least one DS needs to be not 
> unknown,
> since - for most of the usage - unknown doesn't graph anything and I want 
> to
> look at 'live' data rather than white space, since the white space doesn't 
> tell
> me much except that the system might have been down for a while - but even 
> if,
> I'm interested in downtimes in 'the middle of the graph', not at the 
> beginning or
> end - very similar to stripping of whitespace at the beginning/end of a 
> string.


First of all: I have not tried the latest version, and I probably won't 
anytime soon. But please do verify that your problem isn't already fixed.

The rest of this mail is written under the assumption that the current 
release still works like my 1.2.something does.


I think the best solution would be to store 'beginning of time' separate 
from the RRD file. Perhaps in its filename?  Alternatively fix/have fixed 
what I consider to be a possible bug.

I agree it's not a nice solution but unless someone comes up with a better 
idea (I may have provided one below), you will have to choose between this 
idea and a sequential search.  Storing start time at least gives you the 
possibility to skip a large amount of data in the beginning.  Just take the 
highest of this stored time and the outcome of rrdtool first.

> 1. In my GUI I'd like to (initially) point the user/viewer to the entire 
> existing
>   range of data, from where he can zoom in if he wants ...

Clear.  Start looking from the time in my previous paragraph, or just take 
that time as the start time assuming unknown data, if any, will not span 
much time.

> 2. I'm collecting data from many independently running systems (with their 
> own
>   instance of logging/graphing with rrdtools) into a central system 
> allowing
>   fast access to the (previously exported/imported) databases from the 
> independent
>   systems. On that central location I'd like to be able to tell if that 
> dbexport
>   I'm getting is just a grown version of the exisiting database (same 
> startpoint,
>   new endpoint) or if it is a complete new instance of the database 
> (perhaps due
>   to reinstallation of that system)

It seems that your logic will fail?

After some time, e.g. five years, you will have databases with a different 
startpoint. This is because RDDtool databases won't grow in size (unless you 
ask them to) and therefore reuse old locations. 'rrdtool first' will change 
after every update.

> ... there are more reasons but those are the most important ones ...
> Basically I wan't to know when I have inserted the first value into that 
> rrd rather
> than seeing how far it potentially reaches back ...

One idea which will probably work but does also take some effort:  modify 
rrd_create.c so that it stores a timestamp somewhere in the static portion 
of an RRD. This timestamp should be what rrdtool last would return at that 
moment, but it would be a static number. Then either build a new command 
'rrdtool created' (returning that timestamp) or modify 'rrdtool first' to 
return max(first,created).

IMHO modifying 'rrdtool first' could be considered a bug fix, as it would 
make the program do what the manual page seems to be promising.

Currently it does _not_ "return the UNIX timestamp of the first data sample 
entered".  It returns the timestamp of the first sample in the database and, 
unlike VDEF first, uses the end of that interval.

An RRA with 100 rows of 1 steps per row and 60 seconds per step contains 
6000 seconds.  If the start time is set to 1227006000, the first interval in 
the database runs from 1227000000 to 1227000060.

I expect rrdtool first to return 1227000000 (beginning of database) or 
1227006000 (beginning of data). It returns 1227000060.

After updating the database using timestamp 1227006060, rrdtool first's 
manual page promises to return either 1227006000 or 1227006060 (depending on 
how you interpret it) but it does return 1227000120.


I've included a small bash script and its output.  Do not run it in a 
directory containing 'test.rrd' if you want to keep that file. You could use 
it to compare against a recent version of rrdtool, see if something relevant 
changed.

---
start=1227006000
step=60
steps=1
rows=100
rrdtool create test.rrd --step $step --start $start DS:x:GAUGE:60:U:U 
RRA:AVERAGE:0:$steps:$rows

echo Start time parameter: $start
echo Duration $((step*steps*rows)) seconds: 
$((start-step*steps*rows))..$start
echo -n first:
rrdtool first test.rrd
echo -n 'last: '
rrdtool last test.rrd


tm=$((start+60))
echo -e Updating @ $tm
rrdtool update test.rrd $tm:0

echo -n first:
rrdtool first test.rrd
echo -n 'last: '
rrdtool last test.rrd

tm=$((tm+60))
echo -e Updating @ $tm
rrdtool update test.rrd $tm:U

echo -n first:
rrdtool first test.rrd
echo -n 'last: '
rrdtool last test.rrd
---


Start time parameter: 1227006000
Duration 6000 seconds: 1227000000..1227006000
first:1227000060
last: 1227006000
Updating @ 1227006060
first:1227000120
last: 1227006060
Updating @ 1227006120
first:1227000180
last: 1227006120


Be aware: from your description, you want that last 'last' to be 1227006060.

You also want 'first' to return 1227006000 and not 1227006060.

Why would 'first' be 60 seconds less than you would expect at first glance? 
Consider step==60, steps==1 and rows=1. The RRA would be just one row, 
containing one step of 60 seconds. 'last' would be the update time (e.g. 
1227006060). The amount of data available, the time range, is 60 seconds 
long thus starts 60 seconds before 'last'.  'rrdtool graph --start 
1227006000 --end 1227006060 ...' is what you want in this example.



More information about the rrd-users mailing list