[mrtg] "rrdupdate" on load-balancing cluster

Mon Jan 25 19:40:14 CET 2010

I've done this for a couple of large site installations
with more than 100,000 targets being monitored.
It works pretty well, though I've had varying levels
of success when the NFS server is also doing its
own work; in many ways, having a dedicated NFS
server seems to be more stable, with the polling
and graphing functions on their own hardware as
well.

As long as only one process on one box is writing
to a given RRD file at a time, you don't have to worry
about the thread-safe-ness.  In general, the SNMP
polling portion is very lightweight; you'll find that
there's not much benefit to trying to split that off from
the rrd-update process, as you generate just about
as much load sending the results of the SNMP polls
from your poller to the rrd-update box as if the rrd-update
box did the SNMP request itself.

So, I'd recommend having two pools of boxes; your
SNMP poller/rrd-update boxes, split so that a given
device is polled from just one of your polling boxes
at at time; but have the files stored on a common
NFS volume, served by a separate NFS server.
Then, have a second pool of servers for doing
your user presentation layer, whether it be 14all,
mrtg.cgi, weathermap, etc.

That way, as you add more targets/devices, you 
can scale your polling/rrd-update layer horizontally,
and as you add more viewers/users you can scale
your presentation layer horizontally.
And if you need additional NFS IO-ops, you can
add a second NFS server, and split your rrd files
across two volumes, both of which are mounted
by the various back-end and front-end boxes as
necessary.

This is something that's worked well for me in the
past, but every situation is different.

Matt

----- Original Message ----
> From: Kristoff Bonne <kristoff.bonne at skypro.be>
> To: mrtg at lists.oetiker.ch
> Sent: Mon, January 25, 2010 6:03:32 AM
> Subject: [mrtg] "rrdupdate" on load-balancing cluster
> 
> Hi,
> 
> 
> We are running a couple of servers running "mrtg" on a large number of
> devices (+20000 rrd files in total), used for both "mrtg" (14all) and
> "weathermap" applications.
> 
> I'm thinking of trying to implement the concept of a computer-cluster
> for this to make this more robust and future-proof.
> 
> 
> The basic idea would be to "seperate" the three different elements of
> network-monitoring:
> 
> - First, a number of "polling" boxes would gather the information from
> the network-elements.
> 
> - After that, these boxes  would fire up a "rrd-update" command to a
> number of "rrd-servers" which would contain the rrd-files.
> 
> - The RRD-files would then be made available to the "web-frontend"
> (running 14all and weathermap), probably via NFS.
> 
> 
> 
> Two questions:
> 
> 1: I do not think I will be the first person to think of this.
> 
> Is anybody aware of any implementations like this?
> Does there exist a client-server version of the "rrd-tools"?
> 
> 
> 
> 
> 2: Looking in the librrd API documenation, I found this troublesome
> remark concerning threads:
> 
> /* NOTE: rrd_update_r are only thread-safe if no at-style time
>     specifications get used!!! */
> 
> What exactly does this mean?
> 
> If I want to write a "rrdupdate-deamon" myself, it needs to run in
> threads-mode and it must use be able to use timestamps!
> 
> 
> Does this mean that this is completely impossible, or are there ways
> around this.
> 
> If I would add a piece of code that implements a MUTEX based on the
> file-name of the rrd-file being updated, would this be enough to support
> rrd-updates with timestamp?
> 
> 
> 
> Cheerio! Kr. Bonne.
> 
> _______________________________________________
> mrtg mailing list
> mrtg at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/mrtg