[rrd-users] differing disk i/o behavior for rrd update

Christian Smyth clsmyth at fuzzy-elves.org
Tue Feb 28 03:21:15 MET 2006


Folks,

Hi, I'd like your take on some behavior I've seen with "rrdtool update"
on two different Linux systems I have -- let's call them "oldbox" and
"newbox".  The behavior I'm seeing is as follows:  On oldbox when I do
an "rrdtool update" and observe the behavior with "strace", I notice a
128 KB read, another read which averages out to be about 72 KB, another
128 KB read, and a small write - 6 KB or so.  There are some seeks, but
those don't cause disk activity, so I am not worried about them.  On
newbox, though, when I strace the same update command on the same rrd, I
notice six 4 KB reads (well, the fourth one is a little less), and three
writes - 1-ish KB, 4 KB, and about 0.5 KB.

I am happy that rrdtool on newbox exhibits what I'd call a "more
friendly" i/o profile than rrdtool on oldbox :-)  But I don't know why
this difference in behavior is so, and I worry that I'll update some
random thing in the future and cause the unfriendly behavior to
reappear.  So I need to know what causes the differing behavior.

With that in mind, allow me to give you some background, and describe
the two boxes' setups.  Oh, and I apologize in advance if this is a
ludicrous question...the two boxes are quite different, as you will see.
There may be too much difference to tell what is going on.  But I'm
going to ask anyway, because I am not sure where to look.

First off, I am looking at rrdtool at this low level because I have an
application that uses rrdtool, and I need to purchase a new, better
server (let's call this one betterbox) for the app, which currently
lives on oldbox.  I have the hardware for betterbox picked out in my
mind, but before I buy it I am trying to prove to myself that it will be
good enough from a disk-performance perspective.  The important detail
on betterbox is that it will have hardware RAID-10 capability and 6
disks.  The closest match I currently have in my posession is newbox,
which has hardware RAID-10 but only 4 disks.  oldbox, by contrast, has
software RAID-1 and only 2 disks.

I am concerned about disk performance in the first place because the
application has nearly 4000 rrd's -- almost 30 GB worth -- and they are
all updated every 5 minutes.  Actually, about 20% of them are updated
every minute.  And oldbox is struggling under that read load.

So anyway, onto the system configs.  oldbox is as follows:

oldbox
	Gentoo Linux (a few months since last update)
	kernel 2.6.11.12 (built by hand)
	gcc 3.3.5
	rrdtool 1.2.12 (built by hand, outside of portage)
	libart_lgpl_2 2.3.17

newbox is as follows:

newbox
	CentOS Linux 4.2 (unpatched)
	kernel 2.6.9-22.EL (default kernel)
	gcc 3.4.4-2
	rrdtool 1.2.12 (built by hand, just like on oldbox)
	libart_lgpl_2 2.3.16
	libart_lgpl 2.2.0 (not sure why this one is there)

The boxes are nearly identical hardware, except that oldbox has two
large-ish (120 GB, I think) Ultra320 Seagate SCSI drives, while newbox
has four smaller (36 GB, I think) drives of the same U320 Seagate
product line as the drives in oldbox.  As I previously mentioned, oldbox
has its 2 drives in a software RAID-1 array, while newbox has its 4
drives in a hardware RAID-10 array.  I don't know if any of this has
anything to do with the behavior I'm seeing, but I thought I'd let you
know, in the interest of completeness.

Oh, the exact update command I am using is as follows:

strace -e trace=open,read,write -f /usr/local/rrdtool-1.2.12/bin/rrdtool
update iostat.Linux.hda.rrd
1143000300:0.00:33.00:0.00:8.28:0.00:330.26:0.00:165.13:39.87:0.05:5.81:0.52:0.43

On oldbox, the salient part of the trace looks like this:

open("iostat.Linux.hda.rrd", O_RDWR)    = 3
read(3, "RRD\0000003\0\0\0\0/%\300\307C+\37[\r\0\0\0\5\0\0\0,\1"...,
131072) = 131072
read(3, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370"...,
60848) = 60848
read(3, "RRD\0000003\0\0\0\0/%\300\307C+\37[\r\0\0\0\5\0\0\0,\1"...,
131072) = 131072
write(3, "\354\314 D\0\0\0\0UNKN\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
6684) = 6684

On newbox, it looks like this:

open("iostat.Linux.hda.rrd", O_RDWR)    = 3
read(3, "RRD\0000003\0\0\0\0/%\300\307C+\37[\r\0\0\0\5\0\0\0,\1"...,
4096) = 4096
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370"...,
4096) = 3504
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "RRD\0000003\0\0\0\0/%\300\307C+\37[\r\0\0\0\5\0\0\0,\1"...,
4096) = 4096
write(3, "\354\314 D\0\0\0\0UNKN\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1884) = 1884
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
704) = 704

So there you have it.  If you have a clue why this behavior is different
from one box to the next, please let me know!  I will be glad to provide
more detail, as necessary.

aTdHvAaNnKcSe,
Christian Smyth

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://lists.ee.ethz.ch/rrd-users
WebAdmin    http://lists.ee.ethz.ch/lsg2.cgi



More information about the rrd-users mailing list