[rrd-users] Scaling rrd tables for best performance

Thu Dec 6 16:52:30 CET 2007

Alex van den Bogaerdt wrote:

> when you are going to do your benchmarking, please consider to keep
> an rrdtool process running in 'remote control' mode: "rrdtool -"
> Perhaps it makes a difference.

	Wow, I missed that. It seems it will improve performance greatly.

	I noticed the remote control feature can be used over a tcp stream to 
create a server which sounds like an excellent idea, but how it would 
behave if I had two different clients trying to access two different 
working directories at the same time?

	It doesn't seems that rrdtool would be aware of multiple clients, but 
that it would instead expect all commands in a sequential order as if it 
came from a single client.

	I will have parallel processes polling SNMP data, and it would be great 
to have a centralized service where all processes could connect and feed 
the data. But if rrdtool can't handle multiple clients that way, I could 
just run one instance for each one of the pollers (having the number of 
servers exactly as the same as the number of clients).

> Also know that some significant changes were made with respect to
> caching. You really ought to keep an eye on development if memory
> consumption and caching is important to you.

	Memory consumption is not a concern in regards to bottleneck issues: 
now days it's relatively cheap to have a server with a few gigabytes of 
good ram. I'd prefer as much caching as possible to speed up things - as 
long as I could control how often data would be synced to the disk in 
order to guarantee some minimal lost in case of failure.

	So, the question remains: how does rrdtool handles memory and disk I/O 
when running as the remote control? Would it write to the disk after 
every update or would it cache it and flush it to the disk after a 
certain period of time (or after accumulating a certain amount of data)? 
Would it keep the DSes in memory so graphing would be faster (no disk 
reading)?

	I just want to understand how the current stable version works (I don't 
care for 1.3 at the moment) so I can scale my setup as best as possible. 
For now I'm sure the hardware I have will be enough even for a bad 
implementation, but I'm concerned about the future. I don't wanna see 
myself a year from now having to change everything because it was poorly 
planned.

	I understand that some of the new features on 1.3 branch would make 
things faster, but I want to work only with what's stable now, even if 
it means having less caching than I'd like to see, for example.

Regards,
Eduardo M. Bragatto.