[rrd-users] rrd create --start date; internal treatment

Filip Moritz fil at taz.de
Wed Sep 30 11:20:42 CEST 2009


I didn't realize my discussion with Tobi went pm, so I'll summarize for the benefit of the archive.

g., fil


---
T: the start time is the 'last update time' if you use rrdtool info.

F: so thats true until the first update, right? After that the value set with --start is lost?

T: yes [...]


---
T: note that rrdtool  is best suited for regular updates ... (time series).

F:
> Well, my data may account for sort of in the middle between
> constantly updated timeseries data and timed events: Accesses on
> single online news articles. [...] I have a scheme of minutely
> updated timeseries data in the beginning and less frequent hits
> later. Talking some 40k+ articles (and RRDs) I want to only
> update those actually hit in any minute. [...]

> What I am seeing now is that total hit count is out of bounds
> when reporting on time frames much larger than the actual
> measurement period (article came online last week, report on
> absolute hits in 2 years). I assume this is due to unknown values
> before "create --start", that are "backward padded" from the
> initial peak during aggregation. Hence my question:
> > > Moreover: How is start time treated internally? It seems
> > > unknown values before start time get involved in aggregation
> > > is this true?
> If it is: Depending on xff setting I expect either initial data
> to be lost to the consolidation interval becoming unknown or to
> be exaggerated due to "backward padding". Are those asumptions
> reasonable?


T:
ah I see ... so this is what you do ...
if you are reporting in 1 minute intervals

* use absolute type data sources
* as you see a hit, run

  rrdtool update ${t_60}:U
  rrdtool update ${t}:${hits_in_the_last_60_seconds}

the trick here is to establish a new 'starting point'. set mrhb to 61


F:
guess that reads
 ${t_60} = 60 sec before now
 ${t} = now
 mrhb = heartbeat argument to DS?
?

I'm not getting it entirely.
 update ...:U is just for canceling the heartbeat?
 any mrhb > --step has the same effect in this updating scheme, no?
 won't all intervals between updates become unknown?
 so I will still have to use xff ~1 I guess?
 but then all those unknowns will be padded by some avg and I'll get much too high absolute numbers or am I missing something? 

As of now I use
  rrdtool create ${uid}.rrd --step 60 --start now-90
    DS:pi:ABSOLUTE:999999999:0:999999999 ...
with
  rrdtool update ${uid}.rrd now-20 at 0 now-10@$hits now at 0

The idea was to allow for arbitrary long time between updates and flatten the period between updates by those @0s


T:
you are right at consolidation they will go unknown ... so do the
first update with 0 and then with the real value ... this will cause the
interval between the hits to be filled with 0
... at creation time you have to update with 0 once as well ...


T:
also note that you must do this

update x-60:0
update x:real_value

on every turn not just initially.


F:
so I guess all I missed was indeed to set --start far enough in the past so at least one consolidation interval of my longest-term RRA fits in.






----- "A Darren Dunham" <ddunham at taos.com> schrieb:

> On Tue, Sep 29, 2009 at 02:52:24PM +0200, Filip Moritz wrote:
> 
> > With rrdtool create, --start sets a boundary for the earliest
> accepted
> > values. I assumed this timestamp would be stored inside the rrd
> > metadata, now it apears it isn't. Is this correct? Is there any way
> to
> > recover start/creation time from an rrd file?
> 
> Yes.  The rrd database is always a fixed size, and you can never
> update
> older values (only add later values).  So the database is created
> with
> the last update time equal to the --start time. 
> 
> > Moreover: How is start time treated internally? It seems unknown
> > values before start time get involved in aggregation: Time frames
> > overlapping start time either become unknown or the unknown values
> > padded leading to exaggerated values especially for time-series
> > starting with high peaks (here: hits on online news articles).
> 
> > Talking workarounds: Is there an option to create rrds with
> pre-zeroed values?
> 
> Create a database with a start time before any time frame you may be
> interested in.  Then input zeros up to the point you want.
> 
> > Does anyone know good resources or maybe some thread subjects on
> the
> > use of rrd with very infrequent updates?
> 
> RRD mainly deals with *rates*, so it's comparing the difference
> between
> consecutive updates.  If you don't give it enough data in your
> timeframe, you won't get good data.
> 
> One way to fake it is to write a wrapper for your update process. 
> Have
> a script that takes your (non-zero) update.  Then have it check the
> RRD
> for the last update time.  Have it update the database explicitly
> with
> zeros for the step times between the last update and your current
> (non-zero) update.
> 
> Make sense?
> 
> -- 
> Darren
> 
> _______________________________________________
> rrd-users mailing list
> rrd-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users



More information about the rrd-users mailing list