[rrd-developers] Suggestion for API extension

Thu Jul 30 22:53:35 CEST 2009

Am 30.07.2009 17:46, schrieb Dan Cech:
> Benny Baumann wrote:
>   
>> Though rrdtool_info will need some cleanup too. Simply returning an
>> array will all the needed values should be sufficiant and much more
>> practical.
>>     
>
> This is an interesting BC question, all the updates I've made so far are
> completely backwards compatible, not requiring any changes to existing
> code using the extension.  Changing the rrdtool_info function to return
> an array rather than printing the information is a pretty major change.
>   
I more or less thought of a parameter $as_array = false which if not
given defaults to false, if given tells whether the string should be
given or an array with all the information should be returned.

But since I doubt this extension in the old form has reached any
significant distribution even having tthe array being returned as
default shouldn't matter that much.
> The rrd_fetch output presents a similar problem, all the values are
> returned in a single long array at present.  It would be nice to return
> the data as a multi-dimensional array with each 'row' of data grouped
> together (and very simple to do in the extension) but it would break BC
> with existing scripts using the current version of this function.
>   
As the existing extension is quite incomplete I guess noone should
really complain about if it is more or less rewritten and thus modified
to be of much more use. Even then I'm writing on a Library wrapper in
PHP that will take care of wrapping the functional interface of the
extension into an OOP interface for PHP5. More on this wrapper library
once the basic usage works properly.
>>> I was planning to tackle rrd_info next, but haven't looked at rrd_dump.  
>>>       
>> I already completed rrd_info, but didn't parse the returned key names.
>> I'm writing a PHP wrapper for this PHP extension and as such I can do
>> parsing of the keynames there.
>>     
> Yeah, returning the raw data as it comes from rrd_info_r is easy, but I
> was planning to parse it out into a multi-dimensional php array.  A
>   
I planned to do this in my wrapper library as I have to postprocess the
data anyways.
> separate php-code library to parse the output of the rrd_info function
> in the php extension seems like it would make it more difficult to use
> the system because you would need to have the userland php code as well
> as the extension itself.
>   
The userland code is a wrapper around the functions in such a way that
you won't need to care about the command line interface for RRDTool. For
example, even though the creation of a new RRD file is much more
verbose, the following source saves you from doing all the calculations
yourself:

---
include 'rrdtool.php';

$RRD = RRDTool::Archive();
if(!$RRD->Load('net.rrd')) {
    $RRD->Start = 0;

    $RRD->addDS(
        RRDTool_DataSource::createDS(
            'trafficIn', RRD_DS_COUNTER, 600,
            array(
                RRD_PARAM_CF_MIN => 0,
                RRD_PARAM_CF_MAX => 65535
                )
            )
        );
    $RRD->addDS(
        RRDTool_DataSource::createDS(
            'trafficOut', RRD_DS_COUNTER, 600,
            array(
                RRD_PARAM_CF_MIN => 0,
                RRD_PARAM_CF_MAX => 65535
                )
            )
        );

    $RRD->addRRA(RRDTool_Archive::createRRACF(
        RRD_RRA_CF_AVG, 0.5,
        $RRD->getPDPIC(RRD_DEFAULT_STEP),
        $RRD->getPDPIC(RRD_TIME_DAY)
        ));
    $RRD->addRRA(RRDTool_Archive::createRRACF(
        RRD_RRA_CF_AVG, 0.5,
        $RRD->getPDPIC(15 * RRD_TIME_MINUTE),
        $RRD->getPDPIC(RRD_TIME_WEEK, 15 * RRD_TIME_MINUTE)
        ));
    $RRD->addRRA(RRDTool_Archive::createRRACF(
        RRD_RRA_CF_AVG, 0.5,
        $RRD->getPDPIC(RRD_TIME_HOUR),
        $RRD->getPDPIC(RRD_TIME_MONTH, RRD_TIME_HOUR)
        ));
    $RRD->addRRA(RRDTool_Archive::createRRACF(
        RRD_RRA_CF_AVG, 0.5,
        $RRD->getPDPIC(4 * RRD_TIME_HOUR),
        $RRD->getPDPIC(RRD_TIME_YEAR, 4 * RRD_TIME_HOUR)
        ));

    $RRD->addRRA(RRDTool_Archive::createRRACF(
        RRD_RRA_CF_MAX, 0.5,
        $RRD->getPDPIC(15 * RRD_TIME_MINUTE),
        $RRD->getPDPIC(RRD_TIME_DAY, 15 * RRD_TIME_MINUTE)
        ));
    $RRD->addRRA(RRDTool_Archive::createRRACF(
        RRD_RRA_CF_MAX, 0.5,
        $RRD->getPDPIC(RRD_TIME_HOUR),
        $RRD->getPDPIC(RRD_TIME_WEEK, RRD_TIME_HOUR)
        ));
    $RRD->addRRA(RRDTool_Archive::createRRACF(
        RRD_RRA_CF_MAX, 0.5,
        $RRD->getPDPIC(4 * RRD_TIME_HOUR),
        $RRD->getPDPIC(RRD_TIME_MONTH, 4 * RRD_TIME_HOUR)
        ));
    $RRD->addRRA(RRDTool_Archive::createRRACF(
        RRD_RRA_CF_MAX, 0.5,
        $RRD->getPDPIC(12 * RRD_TIME_HOUR),
        $RRD->getPDPIC(RRD_TIME_YEAR, 12 * RRD_TIME_HOUR)
        ));

    $RRD->Save('net.rrd');
}
---

The rrdtool.php is still in developement and thus without any real
function yet (I needed the rrd_info function first for reading back
structural information).

What the above code will create should be obvious. As I said: It's a bit
more verbose, but much more logical to the causal observer. The Function
getPDPIC stands for Primary Data Point Interval Calculation ... What it
does should be obvious :P
>> Some work on Open Basedir Restriction can be found at
>> http://blog.benny-baumann.de/?p=352 (German). The patch there isn't
>> fully up-to-date, bbut basically works (except for one minor typo I
>> already fixed in my local dev version).
>>     
> I hadn't considered open_basedir, but it seems like a good idea to add
> this safety feature.  Enforcing open_basedir on the actual RRDs
> referenced in graph, xport, etc will be more difficult though!
>   
Yes. Correct, though this might require some parsing on the parameters
given. I spared that point for now though as xport and graph don't write
to files referenced in the parameter strings, yet only to the filename
given as target. And as RRD rejects files not confirming to its
fileformat no real harm should be possible (You only might compromise
existing RRDs if you know where they are).

But completing the Basedir Restriction should be included in a final
version of the extension. That's why I emphasized this in my blog
posting, as the current implementation leaves some holes people should
be aware of when using this extension.
>> Would be nice if we could exchange our versions to merge the changes,
>> thus avoiding duplicate work.
>>     
> Agreed, any suggestions on the best way to do that?
>   
I have my version in a SVN repo. You either could:
1. Send me your file (or attach it) and I'll merge it into my SVN,
sending back the result
OR
2. Send me a username+APR1 password hash (htpasswd) and I give you RW
access on that repository.
OR
3. We use a completely different approach :P
>> What do you think of offering a callback based variant in addition to a
>> file based variant thus programs that need the returned data in a buffer
>> can grab it without needing to write to the filesystem?
>>
>> Other functions (e.g. for rrd_graph) that write to disk might use such a
>> callback too, or offer returning the ressources they created to the
>> caller (i.e. giving the generated image as a internal ressource to the
>> caller).
>>     
> I can see how this might be a useful thing to support, in the case of
> rrd_graph I was considering implementing and interface to graphv which
> would allow returning the image data as a php string directly, but the
> volume of data returned by rrd_dump can obviously be much larger than
> the typical graph image.
>   
IDK if it would be that hard to enable the php extension to return e.g.
a GD2 image ressource as its result. Though having it return the image
data as string shouldn't be wrong neither (IIRC GD supports loading
images from string input - and be it (ab)using some fopen wrapper magic).
> I typically output the graph images to temporary files named according
> to a hash of the graph arguments, which allows me to re-use graph images
> for multiple clients without needing to regenerate them from the source
> rrds (very useful when multiple clients are requesting identical graphs).
>   
I see. The downside (and I exactly have this issue on my server) is if
your HDD is connected by slow media (e.g. iSCSI) and thus reading from
or writing to disk takes considerable time, compared to allocating about
100k.
> I'm no expert but it seems like the data will still all need to be
> collected into a buffer before it can be returned to the calling php
> script as a string or php array, so in the case of rrd_dump it may be
> more efficient to write out the xml to disk and perform any operations
> on it using a stream-based xml parser.
>   
Well, given the callback you can essentially just do exactly this: Every
time you get a new line from rrd_dump you can feed it as next input to
the XML parser. Anyway for my implementation of rrd_dump I want the
extension function to return the output as a bare string - as if you
hade piped the output into a memory buffer. Using such a callback method
seems most reasonable to me as you should do as much as possible in
primary memory.
> Dan
>   
Regards,
BenBE.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20090730/411965d7/attachment-0001.bin