[rrd-users] Reading thousands of rrds

Mike Lindsey mike-rrd at bettyscout.org
Sun Jul 11 09:57:23 CEST 2010


On 7/10/10 9:09 AM, Shem Valentine wrote:
> Hello list,
>
> I have a few thousand rrd's that I need to run a report against.  I'll need
> to sum the updload and download given x amount of time.  I'll be using
> Python to write this report.
>
> My biggest concern is the performance hit it may take to run this report.
> I was wondering if anyone has any suggestions as to how they would go about
> it?
>
> Right now I'm considering running the report as a cron job during off peak
> hours and storing the results in a format that would be less intensive to
> retrieve.
>
> Any ideas/suggestions are appreciated,

I've got a python script that runs a weekly report that calculates 
overly used servers by pulling in all load, cpu, memory, swap, disk io, 
and network usage for ~1200 servers.  Probably around 25k data sources. 
  Now, I run the script from a different machine, reading the rrds over 
NFS, so the only potential impact would be disk io, and a little cpu 
from the nfs, but I haven't noticed any performance impact.

So..  My advice is just do it.  If it's a production style environment, 
maybe run it first with nice, or with a few time.sleep(0.01) calls in 
the code.

You might not see any impact worth worrying about, and if you do, just 
work on figuring out how to make the impact acceptable.  If the impact 
is cpu, export the rrds over nfs and run the script elsewhere.  If the 
impact is disk io, consider faster disks or a reading the files slower.

-- 
Mike Lindsey



More information about the rrd-users mailing list