[rrd-users] Disk I/O

Thu Mar 13 20:41:51 CET 2008

On Thu, Mar 13, 2008 at 11:55 AM, Dave Plonka <plonka at doit.wisc.edu> wrote:
> BTW, you said 2.6.x kernel... the "x" is important.  My recollection
> is that the kernel implementation of doing posix_fadvise for RANDOM
> I/O imropoved at 2.6.7 and 2.6.9.

My server's running Redhat's bastardized 2.6.9 kernel "2.6.9-42.0.10.ELsmp"

> If you upgrade to rrdtool >= 1.2.25, please let us know what happens.
> (It'd be great to compare sar data before and after - you can set up
> the sar data collector (sadc) to store data across that.)

I have just upgraded to 1.2.27, unfortunately before I read this about
sar/sadc, hadn't heard of that before :-(

So far not really noticing a major difference, the system does feel a bit
more responsive but the disk I/O is still pretty high. It's still not using
100% of the RAM, not swapping at all. Would adding more RAM (i.e. going up
to 6 or 8GB) help?

Before the upgrade:
Mem:   4041528k total,  3742980k used,   298548k free,   137608k buffers
Swap:  2040244k total,      160k used,  2040084k free,  3055604k cached
(iostat stats:)
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   1.98 14.85 369.31  728.71 2954.46   364.36  1477.23
9.59   108.89  234.11   2.58  99.11

After the upgrade:
Mem:   4041528k total,  2516932k used,  1524596k free,    92840k buffers
Swap:  2040244k total,      160k used,  2040084k free,  1855360k cached
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   5.94 10.89 379.21   87.13 3049.50    43.56  1524.75
8.04   108.36  213.76   2.54  99.11

The disk I/O does drop below 100% a bit more often than before, but its
still pegged at nearly 100%. Then again, it's still getting caught up from
when I had graph updates disabled while doing the upgrade, but its still
"catching up" about as fast as the old version would.

What I'm using rrdtool for on this server is PNP (
http://www.ederdrom.de/pnp/) along with Nagios to make graphs based on the
performance data the Nagios checks generate. With my setup Nagios spits out
a file every 15 seconds with enough data to update about 1500 RRD files. PNP
has a daemon running (NPCD) and when it sees a new file available it
launches a Perl script, that loops over that data and does a bunch of
rrdtool updates. A fresh script is ran for each file. It does use the RRDs
Perl module, you can disable that with a PNP config setting but I'm not
noticing a big difference either way so I left it enabled.

I'm not sure how well we'll be able to take advantage of the caching, since
when the same RRD file is updated the next time around, it will be a new
script doing the update. What process would be using more memory to save
this cache data? Or the OS itself would be doing this caching?

We were thinking of getting 3 x Gigabyte i-ram drives (4GB each, set up in a
RAID 0 so 12 GB total) and a separate dedicated server do the rrdtool
updates. Still not sure if that will be necessary or not. They are only 10k
SCSI disks we're using currently (little SAS disks), and only RAID 1, so
maybe some 15k disks in RAID 0 or RAID 5 would help more than extra RAM?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/rrd-users/attachments/20080313/57ed48b1/attachment.html