[rrd-users] api for iterating over single rows in a rrd file

Fri Jul 4 11:53:06 CEST 2014

On 04/07/14 11:09, Tobias Oetiker wrote:
> Hi Daniel,
>
> Today Daniel Pocock wrote:
>
>> On 04/07/14 09:57, Tobias Oetiker wrote:
>>> Hi Plamen,
>>>
>>> Yesterday Plamen Dimitrov wrote:
>>>
>>>> Hi rrdtool users!
>>>>
>>>> As part of my google summer of code project with Ganglia I'm developing an
>>>> R package that imports the values from an RRD file into vectors in R
>>>> (without exporting to csv, xml or other intermediate format first). I'm
>>>> using the rrdfetch api to do this. Here is a working prototype:
>>>>
>>>> https://github.com/pldimitrov/Rrd
>>>>
>>>>
>>>> Wnile this seems to work fairly well, it struck me that in a scenario when
>>>> I'm only interested in reading one row at a time (e.g. to compare values
>>>> from many RRDs simultaneously) , the rrdfetch code would need to go through
>>>> all error/sanity checks, find the RRA we want and seek to the desired
>>>> location in the file at each iteration.
>>>>
>>>> I know the use of the internal rrd_read, rrd_seek, rrd_open, etc...
>>>> functions is not encouraged so I'm wondering what might be a good solution.
>>>> Ideally, it would be useful to have something that iterates one row at a
>>>> time, reads and caches the data.
>>>>
>>>> Does anything like this already exist?  Would you agree it makes sense to
>>>> have this in addition to rrdfetch?
>>> no there is no iterator ... if you do see a memory problem with
>>> reading the whole file, you may want to split your reading into
>>> of say 10000 rows
>>
>> Hi Tobi,
>>
>> What do you think of the other half of the question, simultaneously
>> reading a row from all the RRDs?
>>
>> Has anything like this been discussed before, has anybody else expressed
>> interest in that?
> Neither ... my question is, what is the purpose of such an
> endevour on the API level.


>From a technical perspective, if rrdtool provided such an API:
- it would make programming easier for people accessing data in this way
- it could be optimized for efficiency (e.g. the IO subsystem would be
reading 4KiB of data from each file at a time, the iterator would then
return values from that buffer and the IO subsystem would only do more
reads when the iterator advances over a page boundary)

>From a user perspective, it would enable people to look across all their
RRDs, e.g. to answer questions like "which CPUs were over 90% utilized
at any time between 09:00 and 10:00"