[rrd-users] Overwrites existing data
linux at thehobsons.co.uk
Thu Oct 3 11:31:59 CEST 2013
Kaushal Shriyan wrote:
>"When new data reaches the starting point, it overwrites existing data" Does it mean the existing data is lost?
> so how do i keep track of the historical data if it overwrites?
Please help me understand with examples.
Yes, once you have filled the buffer, the oldest data is overwritten and is lost - forever. So you need to size the buffers to suit your needs.
But typically, most people don't need to keep *ALL* the data for ever. Take an example I use RRD for a lot at work - storing network traffic data.
I generally only need high resolution data for a short time. After a couple of days I'm generally not interested in the finme detail of traffic patterns, only the max and average rates. By the time I'm out to a year ago, I only really need an overview.
So I collect data in RRDs with a 5 minute step size, and have aggregations of: 5 minutes for 2 days, 1/2 hour for 2 weeks, 2 hours for 2 months, and 1 day for 2 years. So for the last day or two I can see detail, for last wekk I can see less detail (1/2 hour aggregated steps), and for last year I can only see daily values.
For another RRD, I collect data every second and populate an RRD that aggregates at 5 second intervals for 3 hours, and 1 minute intervals for one day. This gives us a fairly fine grained view of network traffic - so if someone shouts "the internet's slow" we can look at this graph and see what the traffic is doing almost real time. But we don't need this level of detail going back more than an hour so we don't bother storing it.
But the main thing is that you need to decide what your requirements are - they will almost certainly be different to every one else's.
If you want to store 5 second samples for 10 years, I imagine RRD tools will handle it if you have the storage space, memory, and processor capacity to handle it. Storing data is fairly easy - it just needs disk space. But processing it (eg to draw a graph) will be quite resource intensive if you had to condense (say) a years worth of 5 second samples down to a 400 pixel wide graph. That's the reason for the aggregation - if all you're interested in for graphine over the previous year is daily averages, then consolidate the data and save storage space and resources needed to graph it.
More information about the rrd-users