[rrd-users] Disk I/O

Thu Mar 13 21:40:46 CET 2008

On Thu, Mar 13, 2008 at 02:41:51PM -0500, Jeremy wrote:
> On Thu, Mar 13, 2008 at 11:55 AM, Dave Plonka <plonka at doit.wisc.edu> wrote:
> > BTW, you said 2.6.x kernel... the "x" is important.  My recollection
> > is that the kernel implementation of doing posix_fadvise for RANDOM
> > I/O imropoved at 2.6.7 and 2.6.9.
> 
> My server's running Redhat's bastardized 2.6.9 kernel "2.6.9-42.0.10.ELsmp"
>
> > If you upgrade to rrdtool >= 1.2.25, please let us know what happens.
> > (It'd be great to compare sar data before and after - you can set up
> > the sar data collector (sadc) to store data across that.)
> 
> I have just upgraded to 1.2.27, unfortunately before I read this about
> sar/sadc, hadn't heard of that before :-(

Just a reminder, but be sure you restart everything after upgrading
rrdtool so all processes using RRD are using the new shared object
libraries.  (You can use fuser or lsof on the old .so files to see
if anything has them open.)

> So far not really noticing a major difference, the system does feel a bit
> more responsive but the disk I/O is still pretty high. It's still not using
> 100% of the RAM, not swapping at all. Would adding more RAM (i.e. going up
> to 6 or 8GB) help?

If you're not using 100% of your RAM, why would you add more RAM? (rhetorical)
That's a shotgun approach.  Find out what's going on first.

<snip>
> After the upgrade:
> Mem:   4041528k total,  2516932k used,  1524596k free,    92840k buffers
> Swap:  2040244k total,      160k used,  2040084k free,  1855360k cached
> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sda          0.00   5.94 10.89 379.21   87.13 3049.50    43.56  1524.75
> 8.04   108.36  213.76   2.54  99.11

I'm wondering why your system shows 1.5GB of memory free.  In my
experience with Linux, this is an indication that either (a) not enough
file content has been accessed since booting to fully populate the
file buffer cache (which seems unlikely, unless recently rebooted or
nor much goes on on the system), or (b) the kernel is being prevented
somehow from using it to cache files.  (Perhaps there are other
reasons, those are just the ones I know.)

On your system, if it were buffer cache limited (which was a somewhat
common RRD problem before 1.2.25), one would expect top to show you
have (almost) no free memory.  (Or can it be locked up in a tmpfs/ram
file system or something?)  Normally, the file buffer cache should
use up all the "available" RAM... it uses lazy deallocation - i.e. the
cached file content stays there until needed for something else.

As an experiment, I'd cat all your RRD files to /dev/null, then run
fincore to see what percentage of them are in buffer cache, and also
see how much free mem you have (with top).  If you have >4GB of RRD
files, I'd expect there to be near zero free memory afterward, on a
properly running Linux system.

For instance, (without the cat), our system with 16GB RAM shows that
>6.3GB of it is occupied by cached RRD file content:

   $ ls |grep '\.rrd$' |~/perl/fincore -IS  
   page size: 4096 bytes
   1646468 pages, 6.3 Gbytes in core for 310798 files; 5.30 pages, 21.2 kbytes per file.

While I didn't think it was necessary for 4GB systems, you can try the
"hugemem" setup; i.e. without support for more than 4GB.
e.g.

  # Linux kernel version: 2.6.9-55.ELhugemem                                    
  # Fri Apr 20 17:18:09 2007                                                    
  CONFIG_X86_4G=y
  CONFIG_X86_SWITCH_PAGETABLES=y
  CONFIG_X86_4G_VM_LAYOUT=y
  CONFIG_X86_UACCESS_INDIRECT=y
  CONFIG_X86_HIGH_ENTRY=y

> The disk I/O does drop below 100% a bit more often than before, but its
> still pegged at nearly 100%. Then again, it's still getting caught up from
> when I had graph updates disabled while doing the upgrade, but its still
> "catching up" about as fast as the old version would.

I don't understand the measure of disk I/O you are using...
Now you said 100% and before you said 500 writes per second or some
low number.   To compare apples-to-apples, I suggest "sar -d [1 10]" and
grep for the specific device file on which the file system that holds
just the RRD files resides, so that you see the RRD I/O in isolation
from unrelated disk I/O.  (sar shows reads and writes per second.)
Perhaps your high I/O has nothing to do with RRD.

<snip>
> I'm not sure how well we'll be able to take advantage of the caching, since
> when the same RRD file is updated the next time around, it will be a new
> script doing the update. What process would be using more memory to save
> this cache data? Or the OS itself would be doing this caching?

Linux (and Un*xes in general) has a unified file buffer cache.
It happens automatically and is available to unrelated processes that
access the same content.

> We were thinking of getting 3 x Gigabyte i-ram drives (4GB each, set up in a
> RAID 0 so 12 GB total) and a separate dedicated server do the rrdtool
> updates. Still not sure if that will be necessary or not. They are only 10k
> SCSI disks we're using currently (little SAS disks), and only RAID 1, so
> maybe some 15k disks in RAID 0 or RAID 5 would help more than extra RAM?

You seem anxious to spend money.  If you measure things to find out
the source of the performance problem, you can point your money in
the right direction - or perhaps find out that it's not necessary. :-)

Good luck,
Dave

-- 
plonka at doit.wisc.edu  http://net.doit.wisc.edu/~plonka/  Madison, WI