[rrd-users] find top 10 in > 4000 rrd files

Thu Oct 2 10:52:14 CEST 2014

Rob Hassing <rob.hassing at deltics.nl> wrote:

> I am measuring the bandwidth usage of over 4000 ports in a network using
> sFlow. 
> 
> The sFlow daemon I use generates a rrd file for each port. 
> 
> So I have over 4000 rrd files named: x.x.x.x-Y.rrd 
> Where x.x.x.x is the ip address of the host and y is the portindex number. 
> 
> Now I would like to find the top 10 of bandwidth usage in these 4000 files. 

There's nothing in RRD to do this. I think what you'll need to do is use rrdtool fetch or graph* to get the data you want from each file, stuffing it into a "normal" database - and then do your query on the database you've created.

* rrdtool fetch will only give you actual values stored in the RRD file - multiple values if you ask for a period other than a single CDP. If you use rrdtool graph, you can use all the RPN stuff to munge data and then use "PRINT" (not "GPRINT") to output the (I assume) single value you are after.

Alternatively, you might run a periodic task to fetch certain data from your RRDs and insert/update a "normal" database. You could then run queries against that database.
For example, suppose you have data consolidated to 24 hour resolution in your RRDs, and shortly after midnight you pull this and update your (eg) SQL database. If you want "top 10 for the previous 7 days" then you do a query, group by x.x.x.x-Y, and with a total(d) in the select clause - sort by total(d) desc and you've got your top 10 in the first 10 results out.

Neither way is right or wrong, they just have different tradeoffs. The first method involves a lot of work (and modest storage) when you want to run the query, but nothing at other times. The second method involves periodic work and more storage even if you don't make any queries - but when you do run queries the results will come out quicker. So a lot depends on how often you will want to be running this.