[rrd-users] Can I stuff an RRD with data after-the-fact?

Simon Hobson linux at thehobsons.co.uk
Tue Sep 16 17:00:07 CEST 2014


Alan McKay <alan.mckay at gmail.com> wrote:

> For various reasons I want to collect data in my own logfile formatted thus :
> 
> <EPOCHTIME>:data1:data2
> 
> And then when I want to see a graph, copy that over to another host and feed it
> into an RRD file, then graph it from there.

That's OK.

> But of course in doing so I get the
> error :
> 
> "illegal attempt to update using time" "when last update time is"

There's no "of course" about it.

> Is there a way to do this?   Basically do the following but tell it to
> force timestamps
> 
> #!/bin/bash
> 
> while read line
> do
> rrdtool update foobar-1yr-5sec.rrd -t diskiops:diskutil $line
> done < diskperf-1yr-5sec.log

That will work just fine *IF* none of the data in your file is older than data in the RRD file AND all the timestamps in the file are in increasing order. With each update, the RRD will update just the same as if you'd fed them in in real time as the data was collected.

Now, if you have (say) a big data file that you keep adding to, and you are trying to update an existing RRD file that's already had some of the data inserted then you'll get the error. In that case, you'd need to modify the script a bit. I see two ways of dealing with it :

1) Just throw away STDERR so you don't see the errors. RRD will barf on each update it's already seen, but then add the new stuff.

2) Have your script use rrdtool last to get the timestamp of (IIRC) the last complete bucket, and "discard" all the entries older than this - then insert the new values. You might still get one or two errors - IIRC rrdtool last gives the timestamp of the last complete bucket (aka Primary Data Point) which is likely to be earlier than the timestamp of the last value inserted.

Lastly, have you considered using rrdcached ? Collect the data on one machine, and do rrdtool updates from there specifying the cached address - the data is then transferred to the other machine and the RRD updated in real time*. It works really well for distributed data collection like this - I use it on many of my systems.

* Subject to flushing the cache.



More information about the rrd-users mailing list