[rrd-users] rrd create --start date; internal treatment
Filip Moritz
fil at taz.de
Wed Sep 30 11:20:42 CEST 2009
I didn't realize my discussion with Tobi went pm, so I'll summarize for the benefit of the archive.
g., fil
---
T: the start time is the 'last update time' if you use rrdtool info.
F: so thats true until the first update, right? After that the value set with --start is lost?
T: yes [...]
---
T: note that rrdtool is best suited for regular updates ... (time series).
F:
> Well, my data may account for sort of in the middle between
> constantly updated timeseries data and timed events: Accesses on
> single online news articles. [...] I have a scheme of minutely
> updated timeseries data in the beginning and less frequent hits
> later. Talking some 40k+ articles (and RRDs) I want to only
> update those actually hit in any minute. [...]
> What I am seeing now is that total hit count is out of bounds
> when reporting on time frames much larger than the actual
> measurement period (article came online last week, report on
> absolute hits in 2 years). I assume this is due to unknown values
> before "create --start", that are "backward padded" from the
> initial peak during aggregation. Hence my question:
> > > Moreover: How is start time treated internally? It seems
> > > unknown values before start time get involved in aggregation
> > > is this true?
> If it is: Depending on xff setting I expect either initial data
> to be lost to the consolidation interval becoming unknown or to
> be exaggerated due to "backward padding". Are those asumptions
> reasonable?
T:
ah I see ... so this is what you do ...
if you are reporting in 1 minute intervals
* use absolute type data sources
* as you see a hit, run
rrdtool update ${t_60}:U
rrdtool update ${t}:${hits_in_the_last_60_seconds}
the trick here is to establish a new 'starting point'. set mrhb to 61
F:
guess that reads
${t_60} = 60 sec before now
${t} = now
mrhb = heartbeat argument to DS?
?
I'm not getting it entirely.
update ...:U is just for canceling the heartbeat?
any mrhb > --step has the same effect in this updating scheme, no?
won't all intervals between updates become unknown?
so I will still have to use xff ~1 I guess?
but then all those unknowns will be padded by some avg and I'll get much too high absolute numbers or am I missing something?
As of now I use
rrdtool create ${uid}.rrd --step 60 --start now-90
DS:pi:ABSOLUTE:999999999:0:999999999 ...
with
rrdtool update ${uid}.rrd now-20 at 0 now-10@$hits now at 0
The idea was to allow for arbitrary long time between updates and flatten the period between updates by those @0s
T:
you are right at consolidation they will go unknown ... so do the
first update with 0 and then with the real value ... this will cause the
interval between the hits to be filled with 0
... at creation time you have to update with 0 once as well ...
T:
also note that you must do this
update x-60:0
update x:real_value
on every turn not just initially.
F:
so I guess all I missed was indeed to set --start far enough in the past so at least one consolidation interval of my longest-term RRA fits in.
----- "A Darren Dunham" <ddunham at taos.com> schrieb:
> On Tue, Sep 29, 2009 at 02:52:24PM +0200, Filip Moritz wrote:
>
> > With rrdtool create, --start sets a boundary for the earliest
> accepted
> > values. I assumed this timestamp would be stored inside the rrd
> > metadata, now it apears it isn't. Is this correct? Is there any way
> to
> > recover start/creation time from an rrd file?
>
> Yes. The rrd database is always a fixed size, and you can never
> update
> older values (only add later values). So the database is created
> with
> the last update time equal to the --start time.
>
> > Moreover: How is start time treated internally? It seems unknown
> > values before start time get involved in aggregation: Time frames
> > overlapping start time either become unknown or the unknown values
> > padded leading to exaggerated values especially for time-series
> > starting with high peaks (here: hits on online news articles).
>
> > Talking workarounds: Is there an option to create rrds with
> pre-zeroed values?
>
> Create a database with a start time before any time frame you may be
> interested in. Then input zeros up to the point you want.
>
> > Does anyone know good resources or maybe some thread subjects on
> the
> > use of rrd with very infrequent updates?
>
> RRD mainly deals with *rates*, so it's comparing the difference
> between
> consecutive updates. If you don't give it enough data in your
> timeframe, you won't get good data.
>
> One way to fake it is to write a wrapper for your update process.
> Have
> a script that takes your (non-zero) update. Then have it check the
> RRD
> for the last update time. Have it update the database explicitly
> with
> zeros for the step times between the last update and your current
> (non-zero) update.
>
> Make sense?
>
> --
> Darren
>
> _______________________________________________
> rrd-users mailing list
> rrd-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
More information about the rrd-users
mailing list