[rrd-users] Slow collection runtimes occurring regularly
linux at thehobsons.co.uk
Tue May 3 00:18:23 CEST 2011
Rick Jones wrote:
>At the risk of typing into Joshua's keyboard I suspect he wasn't asking
>to shift the times to which the RRAs are aligned, but spread the work
>done for those RRAs - the hypothesis being that for an RRA which
>aggregates N samples it not do all the work after N samples but try to
>do 1/Nth the work upon the presentation of each sample.
I don't think rrd tools work like that. They don't store a number of
samples and then process them when a step is complete, they keep just
one accumulator which is updated every time an update is done. If the
update goes past a step boundary then the previous step(s) are
completed - and if any consolidated periods are now complete, these
are updated as well.
Take an example of a step size of 300 (5 minutes), and a
consolidation with 1/2 hour samples. You can update as often as you
like, and each one will update the accumulator for each DS. On the
first update after a five minute slot (ie on the hour, 5 past, 10
past, etc) the primary DSs will be updated with a new value for the
preceding step. On the first update after the hour and half hour, the
consolidated data will be updated.
As already mentioned, you cannot (for example) shift these half hour
slots about - they always start on the hour and half hour as those
are the only times that are an integer multiple of the consolidated
period since unix epoch. In any case, I agree that this is probably
not what the OP is after anyway.
Without knowing the internals, I'm guessing that each time a primary
step is completed, an accumulator for each consolidation is also
updated - so in a sense you are doing some of the work on every
update. Once a consolidated period is complete then it must also be
processed to get a new value for the preceding period - you don't get
any choice in this.
Thus the first update after the hour and half hour will always result
in further processing.
Further, if you had a consolidation for every 2 hours, then on the
even hours you'd get more processing still - so midnight, 2am, 4am
etc would have more activity (load) than the odd hours and half hours.
Because all the processing is done when you supply the updates, what
I was suggesting is that you capture the data along with a timestamp,
and delay submitting some of it. Thus you could have data with a
timestamp of midnight, but only submit it at 4 minutes past - thus
delaying the processing of the primary step and consolidated steps
that end at midnight for 4 minutes.
In a simple system capturing lots of data, one way to do this would
be to capture all the data into variables within a script, and then
sequentially submit updates for different rrds - if processing is
heavy, then this would naturally delay submitting some of them, at
the risk of overrunning into the next step period. In most systems
however, it would be "more tricky" to do this - and it's one of the
things I believe rrdcached was designed to help with.
Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
author Gladys Hobson. Novels - poetry - short stories - ideal as
Christmas stocking fillers. Some available as e-books.
More information about the rrd-users