[rrd-users] differing disk i/o behavior for rrd update
Christian Smyth
clsmyth at fuzzy-elves.org
Tue Feb 28 03:21:15 MET 2006
Folks,
Hi, I'd like your take on some behavior I've seen with "rrdtool update"
on two different Linux systems I have -- let's call them "oldbox" and
"newbox". The behavior I'm seeing is as follows: On oldbox when I do
an "rrdtool update" and observe the behavior with "strace", I notice a
128 KB read, another read which averages out to be about 72 KB, another
128 KB read, and a small write - 6 KB or so. There are some seeks, but
those don't cause disk activity, so I am not worried about them. On
newbox, though, when I strace the same update command on the same rrd, I
notice six 4 KB reads (well, the fourth one is a little less), and three
writes - 1-ish KB, 4 KB, and about 0.5 KB.
I am happy that rrdtool on newbox exhibits what I'd call a "more
friendly" i/o profile than rrdtool on oldbox :-) But I don't know why
this difference in behavior is so, and I worry that I'll update some
random thing in the future and cause the unfriendly behavior to
reappear. So I need to know what causes the differing behavior.
With that in mind, allow me to give you some background, and describe
the two boxes' setups. Oh, and I apologize in advance if this is a
ludicrous question...the two boxes are quite different, as you will see.
There may be too much difference to tell what is going on. But I'm
going to ask anyway, because I am not sure where to look.
First off, I am looking at rrdtool at this low level because I have an
application that uses rrdtool, and I need to purchase a new, better
server (let's call this one betterbox) for the app, which currently
lives on oldbox. I have the hardware for betterbox picked out in my
mind, but before I buy it I am trying to prove to myself that it will be
good enough from a disk-performance perspective. The important detail
on betterbox is that it will have hardware RAID-10 capability and 6
disks. The closest match I currently have in my posession is newbox,
which has hardware RAID-10 but only 4 disks. oldbox, by contrast, has
software RAID-1 and only 2 disks.
I am concerned about disk performance in the first place because the
application has nearly 4000 rrd's -- almost 30 GB worth -- and they are
all updated every 5 minutes. Actually, about 20% of them are updated
every minute. And oldbox is struggling under that read load.
So anyway, onto the system configs. oldbox is as follows:
oldbox
Gentoo Linux (a few months since last update)
kernel 2.6.11.12 (built by hand)
gcc 3.3.5
rrdtool 1.2.12 (built by hand, outside of portage)
libart_lgpl_2 2.3.17
newbox is as follows:
newbox
CentOS Linux 4.2 (unpatched)
kernel 2.6.9-22.EL (default kernel)
gcc 3.4.4-2
rrdtool 1.2.12 (built by hand, just like on oldbox)
libart_lgpl_2 2.3.16
libart_lgpl 2.2.0 (not sure why this one is there)
The boxes are nearly identical hardware, except that oldbox has two
large-ish (120 GB, I think) Ultra320 Seagate SCSI drives, while newbox
has four smaller (36 GB, I think) drives of the same U320 Seagate
product line as the drives in oldbox. As I previously mentioned, oldbox
has its 2 drives in a software RAID-1 array, while newbox has its 4
drives in a hardware RAID-10 array. I don't know if any of this has
anything to do with the behavior I'm seeing, but I thought I'd let you
know, in the interest of completeness.
Oh, the exact update command I am using is as follows:
strace -e trace=open,read,write -f /usr/local/rrdtool-1.2.12/bin/rrdtool
update iostat.Linux.hda.rrd
1143000300:0.00:33.00:0.00:8.28:0.00:330.26:0.00:165.13:39.87:0.05:5.81:0.52:0.43
On oldbox, the salient part of the trace looks like this:
open("iostat.Linux.hda.rrd", O_RDWR) = 3
read(3, "RRD\0000003\0\0\0\0/%\300\307C+\37[\r\0\0\0\5\0\0\0,\1"...,
131072) = 131072
read(3, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370"...,
60848) = 60848
read(3, "RRD\0000003\0\0\0\0/%\300\307C+\37[\r\0\0\0\5\0\0\0,\1"...,
131072) = 131072
write(3, "\354\314 D\0\0\0\0UNKN\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
6684) = 6684
On newbox, it looks like this:
open("iostat.Linux.hda.rrd", O_RDWR) = 3
read(3, "RRD\0000003\0\0\0\0/%\300\307C+\37[\r\0\0\0\5\0\0\0,\1"...,
4096) = 4096
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370"...,
4096) = 3504
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "RRD\0000003\0\0\0\0/%\300\307C+\37[\r\0\0\0\5\0\0\0,\1"...,
4096) = 4096
write(3, "\354\314 D\0\0\0\0UNKN\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1884) = 1884
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
704) = 704
So there you have it. If you have a clue why this behavior is different
from one box to the next, please let me know! I will be glad to provide
more detail, as necessary.
aTdHvAaNnKcSe,
Christian Smyth
--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive http://lists.ee.ethz.ch/rrd-users
WebAdmin http://lists.ee.ethz.ch/lsg2.cgi
More information about the rrd-users
mailing list