[rrd-users] Slow collection runtimes occurring regularly

Steve Shipway s.shipway at auckland.ac.nz
Mon May 2 00:21:36 CEST 2011


Looks like your collections are being done via MRTG, going by the structure.

 

You can't specify when consolidations are done, but on the whole it
shouldn't make such a difference.  We don't experience anything like this
pattern on our MRTG/RRD servers.

 

In order to spread things out over time, there are a number of things you
can do.  Using RRD 1.4.x (possibly the trunk version) allows you to use
rrdcached which has a noticeable (~20%?) performance saving; also you should
tune your use of the Forks: directive in MRTG to make sure you're
multithreading appropriately.  Adding more memory to the server might also
help, if you need to increase the threads (our machines tend to be
memory-bound rather than CPU-bound, but we use many data-collection plugins)

 

If you don't use MRTG in daemon mode then it is less efficient;  RRDTool 1.3
and 1.4 can use memory-mapping and other nice things to improve performance,
as well as MRTG caching the config files, but this works better when in
daemon mode.  

 

Steve

 

  _____  

Steve Shipway

ITS Unix Services Design Lead

University of Auckland, New Zealand

Floor 1, 58 Symonds Street, Auckland

Phone: +64 (0)9 3737599 ext 86487

DDI: +64 (0)9 924 6487

Mobile: +64 (0)21 753 189

Email:  <mailto:s.shipway at auckland.ac.nz> s.shipway at auckland.ac.nz

P Please consider the environment before printing this e-mail 

 

 

From: rrd-users-bounces+s.shipway=auckland.ac.nz at lists.oetiker.ch
[mailto:rrd-users-bounces+s.shipway=auckland.ac.nz at lists.oetiker.ch] On
Behalf Of Joshua Keroes
Sent: Sunday, 1 May 2011 5:53 p.m.
To: rrd-users at lists.oetiker.ch
Subject: [rrd-users] Slow collection runtimes occurring regularly

 

Our collectors run long at regular intervals; in particular every two hours,
and to lesser extents every hour and half hour. Here's a graph showing how
long each collection cycle lasts on one of the collection machines:
http://i.imgur.com/xaZJ5.png - note the regular spikes.

Most RRD's consolidate every 30 minutes, 2 hours, and 24 hours; see the
bottom for a sample `rrd info`. Our current theory is that the RRD
consolidations are causing these long runtimes. If that's the case, is there
a way to evenly stagger the consolidations over time so we can better
distribute RRD update load?

 

Thanks,

Joshua

 

filename = "/rrd/router/cr01.ptleorte.integra.net/tengigabitethernet134.rrd"


rrd_version = "0003"


step = 300


last_update = 1304228713


ds[ds0].type = "COUNTER"


ds[ds0].minimal_heartbeat = 600


ds[ds0].min = 0.0000000000e+00


ds[ds0].max = 1.2500000000e+09


ds[ds0].last_ds = "1596044569532963"


ds[ds0].value = 4.0248335433e+08


ds[ds0].unknown_sec = 0


ds[ds1].type = "COUNTER"


ds[ds1].minimal_heartbeat = 600


ds[ds1].min = 0.0000000000e+00


ds[ds1].max = 1.2500000000e+09


ds[ds1].last_ds = "3460406816844600"


ds[ds1].value = 8.9596753966e+08


ds[ds1].unknown_sec = 0


rra[0].cf = "AVERAGE"


rra[0].rows = 600


rra[0].pdp_per_row = 1


rra[0].xff = 5.0000000000e-01


rra[0].cdp_prep[0].value = NaN


rra[0].cdp_prep[0].unknown_datapoints = 0


rra[0].cdp_prep[1].value = NaN


rra[0].cdp_prep[1].unknown_datapoints = 0


rra[1].cf = "AVERAGE"


rra[1].rows = 600


rra[1].pdp_per_row = 6


rra[1].xff = 5.0000000000e-01


rra[1].cdp_prep[0].value = 9.4104250250e+07


rra[1].cdp_prep[0].unknown_datapoints = 0


rra[1].cdp_prep[1].value = 2.0174889583e+08


rra[1].cdp_prep[1].unknown_datapoints = 0


rra[2].cf = "AVERAGE"


rra[2].rows = 600


rra[2].pdp_per_row = 24


rra[2].xff = 5.0000000000e-01


rra[2].cdp_prep[0].value = 6.5449761744e+08


rra[2].cdp_prep[0].unknown_datapoints = 0


rra[2].cdp_prep[1].value = 1.4734297081e+09


rra[2].cdp_prep[1].unknown_datapoints = 0


rra[3].cf = "AVERAGE"


rra[3].rows = 732


rra[3].pdp_per_row = 288


rra[3].xff = 5.0000000000e-01


rra[3].cdp_prep[0].value = 2.2692529674e+09


rra[3].cdp_prep[0].unknown_datapoints = 3


rra[3].cdp_prep[1].value = 4.7002069004e+09


rra[3].cdp_prep[1].unknown_datapoints = 3


rra[4].cf = "MAX"


rra[4].rows = 600


rra[4].pdp_per_row = 1


rra[4].xff = 5.0000000000e-01


rra[4].cdp_prep[0].value = NaN


rra[4].cdp_prep[0].unknown_datapoints = 0


rra[4].cdp_prep[1].value = NaN


rra[4].cdp_prep[1].unknown_datapoints = 0


rra[5].cf = "MAX"


rra[5].rows = 600


rra[5].pdp_per_row = 6


rra[5].xff = 5.0000000000e-01


rra[5].cdp_prep[0].value = 3.2405792329e+07


rra[5].cdp_prep[0].unknown_datapoints = 0


rra[5].cdp_prep[1].value = 6.9813629778e+07


rra[5].cdp_prep[1].unknown_datapoints = 0


rra[6].cf = "MAX"


rra[6].rows = 600


rra[6].pdp_per_row = 24


rra[6].xff = 5.0000000000e-01


rra[6].cdp_prep[0].value = 3.4089842030e+07


rra[6].cdp_prep[0].unknown_datapoints = 0


rra[6].cdp_prep[1].value = 7.6745619740e+07


rra[6].cdp_prep[1].unknown_datapoints = 0


rra[7].cf = "MAX"


rra[7].rows = 732


rra[7].pdp_per_row = 288


rra[7].xff = 5.0000000000e-01


rra[7].cdp_prep[0].value = 4.4271024386e+07


rra[7].cdp_prep[0].unknown_datapoints = 3


rra[7].cdp_prep[1].value = 8.8648080465e+07


rra[7].cdp_prep[1].unknown_datapoints = 3


 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/rrd-users/attachments/20110501/fec6b083/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4928 bytes
Desc: not available
Url : http://lists.oetiker.ch/pipermail/rrd-users/attachments/20110501/fec6b083/attachment-0001.bin 


More information about the rrd-users mailing list