[rrd-developers] reducing disk IO load (wish list)

Wed May 30 01:24:15 CEST 2007

I am using cricket to collect data for 900 or so devices
and have run into some IO problems that seem to stem from the format of
the RRDs, correct me if I am wrong, a header block and a bunch of
concatenated arrays.

would it not be better to have the arrays interleaved for the same data
granularity?
example
if I have one rrd with 18 data sources (example cisco interface) in octets
out octets in errors collisions ....
each with 5 min avg, 30 min avg, 2 hour max, and 2 hour avg
every 5 min you need to do 19 non sequential reads and writes (header + 18
rras)
every 30 min you need 37 non sequential reads and writes
every 120 min you need 73 non sequential reads and writes

if the file format was header then 5 min interleaved rras, then 2 hour
interleaved rra's ...
you could update all the 5 min values with one  header read and one
sequential read and write for the nth values in the rras.

doing some maths on my setup
I have 28000 RRD files tracking 336000 data values each with 5m avg,30m
avg,2h max,2h avg
to stop disk thrashing I need to cache parts ov the RRD files
5Min (28000 +336000   )*4096 = 1.4G header update + 5min rras
30min(28000 +336000*2 )*4096 = 2.7G header update + 5min & 30min rras
2hour(28000 +336000*4 )*4096 = 5.3G header update + 5min ,30min, 2hour avg
& 2hour max rras
I only have 3.5 Gig free my 5min updates take 4-4.5 min except every 2
hours where they take 15min at 60-70% IO wait

interleaved assume that all blocks overlap a 4K block (worst case)
5Min (28000 +28000*2   )*4096 = 0.32G header update + 5min rras
30min(28000 +28000*2*2 )*4096 = 0.55G header update + 5min & 30min rras
2hour(28000 +28000*2*4 )*4096 = 0.98G header update + 5min ,30min, 2hour
avg & 2hour max rras

so at worst this would cause a 5-10 times reduction in disk IO and caching
requirements. Would this sort of major structural change be possible?

My current solution is a perl wrapper around the RRDs lib to write to CSV
files and a cron job to update the RRDs hourly or when grapher is run.
reducing the caching requirement to 110Meg to do the real time updates.
I see this is similar to RRD Accelerator but uses files directly not via
sockets. When is 1.3 due?

Kevin