[rrd-users] large dataset considerations

Wed Oct 23 23:52:52 CEST 2013

Generally, you should have a separate RRD file when the source data do not
arrive together, or if the number of data are likely to change.  However,
updating a single RRD with many DS is cheaper (in IO terms) than updating
many RRD with one DS each.

EG1: 

Querying CPU usage on a single machine results in stats for User, System,
Wait and Idle.  These are all retrieved simultaneously, so can be held as
separate DSs within a single RRD.

EG2:

An ISP is storing data usage by its clients.  Clients may come and go, and
the statistics for client data usage are obtained by separate queries from
separate routing devices.  In this case, we have a separate RRD for each
client, each RRD holding just the DSs that relate to that one query
(probably bytes in and bytes out).

Of course every application is different and the optimal way to use RRDtool
depends on your particular requirements.  So think about where the data come
from, if the number of DS will change, how often the data arrive, do you
have control over when the data arrive, and so on.

For example, our setup has 30,000+ RRD files, each about 1.5MB in size,
usually holding 8 RRAs and 2 DSs.  The number of RRD changes often, the data
all arrive at different times.  We use rrdcached to manage disk IO
requirements.

Steve

  _____  

Steve Shipway

ITS Unix Services Design Lead

University of Auckland, New Zealand

Floor 1, 58 Symonds Street, Auckland

Phone: +64 (0)9 3737599 ext 86487

DDI: +64 (0)9 923 6487

Mobile: +64 (0)21 753 189

Email:  <mailto:s.shipway at auckland.ac.nz> s.shipway at auckland.ac.nz

P Please consider the environment before printing this e-mail : 打印本邮件，
将减少一棵树存活的机会

From: rrd-users-bounces+s.shipway=auckland.ac.nz at lists.oetiker.ch
[mailto:rrd-users-bounces+s.shipway=auckland.ac.nz at lists.oetiker.ch] On
Behalf Of S Ahmed
Sent: Thursday, 24 October 2013 9:58 a.m.
To: rrd-users
Subject: Re: [rrd-users] large dataset considerations

Hi,

Is this tool used by any large scale usages?

What is considered a large database size?

Just looking to get an idea on how people use this tool at scale.

Scenerio:  Say you want to store time series time informaiton in a Saas
application.

I'm guessing there is some sort of threshold where it makes sense to
partition your data e.g. by a group of customers in order to scale out the
usage

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/rrd-users/attachments/20131023/53378a41/attachment.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5900 bytes
Desc: not available
Url : http://lists.oetiker.ch/pipermail/rrd-users/attachments/20131023/53378a41/attachment.bin