[rrd-users] Slow collection runtimes occurring regularly

Mon May 2 00:21:36 CEST 2011

Looks like your collections are being done via MRTG, going by the structure.

You can't specify when consolidations are done, but on the whole it
shouldn't make such a difference.  We don't experience anything like this
pattern on our MRTG/RRD servers.

In order to spread things out over time, there are a number of things you
can do.  Using RRD 1.4.x (possibly the trunk version) allows you to use
rrdcached which has a noticeable (~20%?) performance saving; also you should
tune your use of the Forks: directive in MRTG to make sure you're
multithreading appropriately.  Adding more memory to the server might also
help, if you need to increase the threads (our machines tend to be
memory-bound rather than CPU-bound, but we use many data-collection plugins)

If you don't use MRTG in daemon mode then it is less efficient;  RRDTool 1.3
and 1.4 can use memory-mapping and other nice things to improve performance,
as well as MRTG caching the config files, but this works better when in
daemon mode.  

Steve

  _____  

Steve Shipway

ITS Unix Services Design Lead

University of Auckland, New Zealand

Floor 1, 58 Symonds Street, Auckland

Phone: +64 (0)9 3737599 ext 86487

DDI: +64 (0)9 924 6487

Mobile: +64 (0)21 753 189

Email:  <mailto:s.shipway at auckland.ac.nz> s.shipway at auckland.ac.nz

P Please consider the environment before printing this e-mail 

From: rrd-users-bounces+s.shipway=auckland.ac.nz at lists.oetiker.ch
[mailto:rrd-users-bounces+s.shipway=auckland.ac.nz at lists.oetiker.ch] On
Behalf Of Joshua Keroes
Sent: Sunday, 1 May 2011 5:53 p.m.
To: rrd-users at lists.oetiker.ch
Subject: [rrd-users] Slow collection runtimes occurring regularly

Our collectors run long at regular intervals; in particular every two hours,
and to lesser extents every hour and half hour. Here's a graph showing how
long each collection cycle lasts on one of the collection machines:
http://i.imgur.com/xaZJ5.png - note the regular spikes.

Most RRD's consolidate every 30 minutes, 2 hours, and 24 hours; see the
bottom for a sample `rrd info`. Our current theory is that the RRD
consolidations are causing these long runtimes. If that's the case, is there
a way to evenly stagger the consolidations over time so we can better
distribute RRD update load?

Thanks,

Joshua

filename = "/rrd/router/cr01.ptleorte.integra.net/tengigabitethernet134.rrd"

rrd_version = "0003"

step = 300

last_update = 1304228713

ds[ds0].type = "COUNTER"

ds[ds0].minimal_heartbeat = 600

ds[ds0].min = 0.0000000000e+00

ds[ds0].max = 1.2500000000e+09

ds[ds0].last_ds = "1596044569532963"

ds[ds0].value = 4.0248335433e+08

ds[ds0].unknown_sec = 0

ds[ds1].type = "COUNTER"

ds[ds1].minimal_heartbeat = 600

ds[ds1].min = 0.0000000000e+00

ds[ds1].max = 1.2500000000e+09

ds[ds1].last_ds = "3460406816844600"

ds[ds1].value = 8.9596753966e+08

ds[ds1].unknown_sec = 0

rra[0].cf = "AVERAGE"

rra[0].rows = 600

rra[0].pdp_per_row = 1

rra[0].xff = 5.0000000000e-01

rra[0].cdp_prep[0].value = NaN

rra[0].cdp_prep[0].unknown_datapoints = 0

rra[0].cdp_prep[1].value = NaN

rra[0].cdp_prep[1].unknown_datapoints = 0

rra[1].cf = "AVERAGE"

rra[1].rows = 600

rra[1].pdp_per_row = 6

rra[1].xff = 5.0000000000e-01

rra[1].cdp_prep[0].value = 9.4104250250e+07

rra[1].cdp_prep[0].unknown_datapoints = 0

rra[1].cdp_prep[1].value = 2.0174889583e+08

rra[1].cdp_prep[1].unknown_datapoints = 0

rra[2].cf = "AVERAGE"

rra[2].rows = 600

rra[2].pdp_per_row = 24

rra[2].xff = 5.0000000000e-01

rra[2].cdp_prep[0].value = 6.5449761744e+08

rra[2].cdp_prep[0].unknown_datapoints = 0

rra[2].cdp_prep[1].value = 1.4734297081e+09

rra[2].cdp_prep[1].unknown_datapoints = 0

rra[3].cf = "AVERAGE"

rra[3].rows = 732

rra[3].pdp_per_row = 288

rra[3].xff = 5.0000000000e-01

rra[3].cdp_prep[0].value = 2.2692529674e+09

rra[3].cdp_prep[0].unknown_datapoints = 3

rra[3].cdp_prep[1].value = 4.7002069004e+09

rra[3].cdp_prep[1].unknown_datapoints = 3

rra[4].cf = "MAX"

rra[4].rows = 600

rra[4].pdp_per_row = 1

rra[4].xff = 5.0000000000e-01

rra[4].cdp_prep[0].value = NaN

rra[4].cdp_prep[0].unknown_datapoints = 0

rra[4].cdp_prep[1].value = NaN

rra[4].cdp_prep[1].unknown_datapoints = 0

rra[5].cf = "MAX"

rra[5].rows = 600

rra[5].pdp_per_row = 6

rra[5].xff = 5.0000000000e-01

rra[5].cdp_prep[0].value = 3.2405792329e+07

rra[5].cdp_prep[0].unknown_datapoints = 0

rra[5].cdp_prep[1].value = 6.9813629778e+07

rra[5].cdp_prep[1].unknown_datapoints = 0

rra[6].cf = "MAX"

rra[6].rows = 600

rra[6].pdp_per_row = 24

rra[6].xff = 5.0000000000e-01

rra[6].cdp_prep[0].value = 3.4089842030e+07

rra[6].cdp_prep[0].unknown_datapoints = 0

rra[6].cdp_prep[1].value = 7.6745619740e+07

rra[6].cdp_prep[1].unknown_datapoints = 0

rra[7].cf = "MAX"

rra[7].rows = 732

rra[7].pdp_per_row = 288

rra[7].xff = 5.0000000000e-01

rra[7].cdp_prep[0].value = 4.4271024386e+07

rra[7].cdp_prep[0].unknown_datapoints = 3

rra[7].cdp_prep[1].value = 8.8648080465e+07

rra[7].cdp_prep[1].unknown_datapoints = 3

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/rrd-users/attachments/20110501/fec6b083/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4928 bytes
Desc: not available
Url : http://lists.oetiker.ch/pipermail/rrd-users/attachments/20110501/fec6b083/attachment-0001.bin