[rrd-users] Input values normalization

Tue Feb 18 18:44:14 CET 2014

> But the accuracy will still only be as good as the collection frequency 
> (and it's relation to the rate of change of the measured value). If the 
> measured value can (for example) rise sharply and drop back again in 
> between samples then even the max function won't tell you anything about 
> it.

Altough your point is valid, it is (IMHO) irrelevant for this particular 
question.  In fact, the higher the sampling resolution, the smaller the 
problem of missing such spikes.

In the case at hand the OP wants to keep his discrete values. Average won't 
do, so MIN MAX or LAST remain, and then I would choose to use both MIN and 
MAX.  Maybe the OP will choose differently.

With proper heartbeat settings there is no need to take a sample every 
second either. It could remain at every 5 minutes or so.

Anyway, the main point of my answer was to have the 'best' RRA with 'steps' 
larger than 1.  I'll elaborate.

> The solution I found is to set a data source step to 1 second to avoid 
> normalization, but this produces big rrd files with a lot of redundant 
> information.
>
> I did not find a satisfactory solution up to now, thanks for any hint.

Something has to give, so if not increasing the size of the database is a 
must, then somehow RRDtool needs to combine several of its input values into 
one. Averaging them is undesirable, then choose one or any of MIN, MAX and 
LAST.

Instead of having each RRA row being 1 times 300 seconds in an RRD with 
'step==300', the same amount of time is stored (and thus not a bigger 
database) when having RRA rows of 300 times 1 second, in an RRD with 
'step==1'.

Let's assume the original database was like this:
created database with "--step 300" (could be left out, as is default)
RRA:AVERAGE:0.5:1:1200  (100 hours: 300 seconds per row, 1200 rows)

Now when creating the database with "--step 1", without increasing its size:
do not specify RRA:AVERAGE:0.5:1:360000  (100 hours: 1 second per row)
but instead specify RRA:AVERAGE:0.5:300:1200 (100 hours, 300 seconds per row 
again)

This would still have fractions in the database, so the next step is to 
alter the consolidation function as suggested before.  As long as RRDtool is 
fed with integer timestamps and when it has '--step 1', normalization will 
be a no-op, and the input will be untouched in this phase. Then during 
consolidation the integer values are kept, which I believe was the goal.

HTH
Alex