> As an experiment, I'd cat all your RRD files to /dev/null, then run<br>> fincore to see what percentage of them are in buffer cache, and also<br>> see how much free mem you have (with top). If you have >4GB of RRD<br>
> files, I'd expect there to be near zero free memory afterward, on a<br>> properly running Linux system.<br><br>I will try out fincore tomorrow before and after cat'ing everything to /dev/null and report the results, thanks for the tip!<br>
<br>> I don't understand the measure of disk I/O you are using...<br>> Now you said 100% and before you said 500 writes per second <br>> or some low number. To compare apples-to-apples, I suggest <br>> "sar -d [1 10]" and grep for the specific device file on <br>
> which the file system that holds just the RRD files resides, <br>> so that you see the RRD I/O in isolation from unrelated disk <br>> I/O. (sar shows reads and writes per second.) <br>> Perhaps your high I/O has nothing to do with RRD.<br>
<br>I was using "iostat -d -x 1" to watch the IO info. The ~400 writes/second I was quoting was write requests per second, the actual number of sectors written to per sec (which seems to be what sar shows) was much higher. Doing a "sar -d 1 10" the average it came up with was this:<br>
<br>Average: DEV tps rd_sec/s wr_sec/s<br>Average: dev8-0 501.25 7.99 9685.86<br><br>That matches up to what "iostat -d -x 1" whos awas the wsec/s stat (sectors written per sec)<br>
<br>Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util<br>sda 0.00 347.06 1.96 507.84 7.84 6917.65 3.92 3458.82 13.58 107.24 155.38 1.93 98.14<br>
<br>The RRD files on this server live on the same partition as the OS as well as the Nagios software we're running so its a little hard to isolate. However if I disable PNP so no RRD files get updated anymore the "util %" as reported by iostat drops off to near 0 in a hurry and does not spike back up to 100% for more than a split second only occasionally. With PNP disabled here's the "sar -d 1 10" average:<br>
<br>Average: DEV tps rd_sec/s wr_sec/s<br>Average: dev8-0 77.50 0.00 3587.20<br><br>...and using "iostat -d -x 1" the %util peaks up to 50-60% occasionally but stays near 0 most of the time.<br>
<br>I've heard about a new "iotop" tool that shows i/o stats on a per-process basis but unfortunately the kernel on this box is not new enough to support that. <br><br>> You seem anxious to spend money. If you measure things to find <br>
> out the source of the performance problem, you can point your <br>> money in the right direction - or perhaps find out that it's <br>> not necessary. :-)<br><br>Well I'm only anxious because its not my money exactly ;-) I work for a fairly hosting company so setting up another server just for graphing would be no problem, if it really comes to that, but you have give me renewed hope, hehe.<br>
<br>Thanks again for all the info,<br>Jeremy<br><br>