[rrd-users] Diurnal average

Simon Hobson linux at thehobsons.co.uk
Sat Jul 15 19:50:35 CEST 2017


Helios Solaris <helios.solaris at gmx.ch> wrote:

>> Over how many days ?
> 
> Over the entire database (2 years).

Ouch - as already mentioned by Alex, that could be a LOT of processing and also prevents you using consolidation.

> Yes, using time shift may work, but it's not really flexible when I want
> to change parameters.

Depends on what you want to change. I would script it so that the parameters are easy to change - that's how most of my graphs work, passing (eg) start, end, and step values that are generated by a case statement in a shell script that takes "day", "week", "month", "year" as a parameter.

I do have another thought ...
How many time periods do you want during the day ? If it's not too many (where "too many" is somewhat subjective), then you might want to look at turning it around - store (say) midnight-1am in one rrd file, 1am to 2am in another, and so on (in this example, using 24 rrd files). Each rrd file can then use consolidation.

And (I'm sort of typing as my thought process flows here), I'm led to another option - but which limits consolidation again.
For midnight to 1am, use a time of day function to set a CDEF to the stored value if the function is true and to either zero or unknown if not. Then another time function and cdef for 1am to 2am, and so on. Then you can use a VDEF to get a consolidation across the whole CDEF to get a single value - which you can then print with a PRINT (not GPRINT) statement.
However, a quick look at the docs suggest that there isn't a simple "time of day" function which complicates matters somewhat.
I **think** this may do it !
DEF:x=somefile.rrd...
CDEF:tod=TIME,DUP,86400,/,FLOOR,86400,*,-
# get time of sample, divide by 86400 (1 day), take integer part, multiply by 86400 (get time value of midnight at start of day) and subtract from time of sample to get the number of seconds since midnight.

CDEF:h0=tod,0,GE,x,3600,LT,[0|UNKN],IF,[0|UNKN],IF
# If 0<=tod<3600 then get x else get [0|UNKN]
VDEF:h0ave=h0,AVERAGE
# Get an average value.
PRINT:h0ave,"%6.2lf'

CDEF:h1=tod,3600,GE,x,7200,LT,[0|UNKN],IF,[0|UNKN],IF
and so on ...


This is the sort of thing I'd just script in Bash - almost trivial to generate an arbitrary number of statements covering appropriate timescales

It's so long since I've used the TIME function, dunno if it should be tod=TIME... or tod=x,TIME... (ie get x, then get the TIME value for the current sample.

Whether you use zero or unknown for the times outside of each window depends a little on your requirements. I suspect that unknown (UNKN) is probably the correct one to use.

Whatever happens, I don't think you can calculate and graph in one go. You'll have a "graph" going back an arbitrary time to generate just one day's worth of data - and I don't think you can have a graph covering (say) 2 years of data but only drawing a line for one day. Something like gnuplot might be a better fit for that.

I'd also expect this to be "quite slow" and resource intensive. Lets consider the case where you do it only by hours. You are creating 1 duplicate of your dataset with time of day calculated, then another 24 duplicates. So that's effectively 25 copies of your database in memory at once.
Go down to (say) 5 minute periods, and it then means 289 (1 + 288) copies of the database in memory at once !
AND you must store all the data you want to use. That means, if you want 5 minute slots for a full 2 years, you need to store 5 minute consolidated data points for 2 years - that's 210240 samples to store (though I guess that's not so much data).



More information about the rrd-users mailing list