[mrtg] "rrdupdate" on load-balancing cluster

Tue Jan 26 18:13:19 CET 2010

Unless my math is off, assuming you use normal size ethernet frames
for your NFS server NIC (1500 byte MTU), your theoretical maximal
packet rate on a gigE link between your NFS server and your rrd-update
box should allow about 83,000 1500 byte packets per second on a
gigE link.  Most typical NFS servers top out at about 50,000 ops
per second
(http://www.spec.org/sfs97r1/results/sfs97r1.html)
so your network generally won't be the bottleneck, it'll be how fast
your NFS server can handle disk I/O.

If you spend a *lot* of money on a really high-end NFS box, you'll
be able to overrun a single gigE link from your NFS server to your
switch; but most of the top performing systems support multiple
NICs, so you can run an etherchannel bundle from the server to
the switch to scale the bandwidth up.

As you note, though, whether you go with a dedicated NFS 
serving appliance, or use your rrd-servers as NFS servers
exporting to your presentation layer, the big challenge you
face is how to make them fault-tolerant and redundant.

You can consider using hardware solutions like snapmirror
between servers, or using a mirroring filesystem like...shoot,
I was just reading up on one, and I lost the link to it, but it's
a mirroring filesystem that's being added to linux, something
like dbrd, or bdrd...I have to run to a conference now, but I
can do more digging up on it later...but that's probably the
right direction for you to be investigating.

Thanks!

Matt

----- Original Message ----
> From: Kristoff Bonne <kristoff.bonne at skypro.be>
> To: mrtg at lists.oetiker.ch
> Sent: Tue, January 26, 2010 1:10:06 AM
> Subject: Re: [mrtg] "rrdupdate" on load-balancing cluster
> 
> Hi Matthew,
> 
> 
> I'm sorry for the late reply.
> 
> 
> Thanks for the reply.
> 
> The more I look into this, the more I get the impression that the
> "rrd"-part will not be the most difficult part.
> 
> It looks like the most difficult part will be how to get the data from
> the "rrd-servers" to the webserver (14all, weathermaps).
> 
> It looks I have two options:
> - Put all the data on a NFS-server (or better, a fault-redudant
> NFS-server cluster).
> This would -however- mean that every rrd-update would result in
> network-traffic on that part.
> 
> 
> - Put the data on the "rrd-servers" and nfs-share the data from every
> rrd-server to the webserver.
> This means that the rrd-files are local on the rrd-server and there is
> no NFS network-traffic for every update.
> 
> The problem is to make this fault-redudant; so it able to deal with
> rrd-servers crashing (which results in stale NFS-connections on the
> web-server).
> 
> 
> 
> However, what option I pick, it looks like I need to look into how to
> set up a fault-redudant NFS-server.
> :-)
> 
> 
> 
> Does somebody have any idea how much NFS network-traffic one single "rrd
> update" creates.
> 
> I want to be sure that, if I chose one single central NFS-server, that
> that does not become a bottleneck.
> 
> 
> 
> Cheerio! Kr. Bonne.
> 
> 
> 
> Matthew Petach schreef:
> > I've done this for a couple of large site installations
> > with more than 100,000 targets being monitored.
> > It works pretty well, though I've had varying levels
> > of success when the NFS server is also doing its
> > own work; in many ways, having a dedicated NFS
> > server seems to be more stable, with the polling
> > and graphing functions on their own hardware as
> > well.
> > 
> > As long as only one process on one box is writing
> > to a given RRD file at a time, you don't have to worry
> > about the thread-safe-ness.  In general, the SNMP
> > polling portion is very lightweight; you'll find that
> > there's not much benefit to trying to split that off from
> > the rrd-update process, as you generate just about
> > as much load sending the results of the SNMP polls
> > from your poller to the rrd-update box as if the rrd-update
> > box did the SNMP request itself.
> > 
> > So, I'd recommend having two pools of boxes; your
> > SNMP poller/rrd-update boxes, split so that a given
> > device is polled from just one of your polling boxes
> > at at time; but have the files stored on a common
> > NFS volume, served by a separate NFS server.
> > Then, have a second pool of servers for doing
> > your user presentation layer, whether it be 14all,
> > mrtg.cgi, weathermap, etc.
> > 
> > That way, as you add more targets/devices, you 
> > can scale your polling/rrd-update layer horizontally,
> > and as you add more viewers/users you can scale
> > your presentation layer horizontally.
> > And if you need additional NFS IO-ops, you can
> > add a second NFS server, and split your rrd files
> > across two volumes, both of which are mounted
> > by the various back-end and front-end boxes as
> > necessary.
> > 
> > This is something that's worked well for me in the
> > past, but every situation is different.
> > 
> > Matt
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: Kristoff Bonne 
> >> To: mrtg at lists.oetiker.ch
> >> Sent: Mon, January 25, 2010 6:03:32 AM
> >> Subject: [mrtg] "rrdupdate" on load-balancing cluster
> >>
> >> Hi,
> >>
> >>
> >> We are running a couple of servers running "mrtg" on a large number of
> >> devices (+20000 rrd files in total), used for both "mrtg" (14all) and
> >> "weathermap" applications.
> >>
> >> I'm thinking of trying to implement the concept of a computer-cluster
> >> for this to make this more robust and future-proof.
> >>
> >>
> >> The basic idea would be to "seperate" the three different elements of
> >> network-monitoring:
> >>
> >> - First, a number of "polling" boxes would gather the information from
> >> the network-elements.
> >>
> >> - After that, these boxes  would fire up a "rrd-update" command to a
> >> number of "rrd-servers" which would contain the rrd-files.
> >>
> >> - The RRD-files would then be made available to the "web-frontend"
> >> (running 14all and weathermap), probably via NFS.
> >>
> >>
> >>
> >> Two questions:
> >>
> >> 1: I do not think I will be the first person to think of this.
> >>
> >> Is anybody aware of any implementations like this?
> >> Does there exist a client-server version of the "rrd-tools"?
> >>
> >>
> >>
> >>
> >> 2: Looking in the librrd API documenation, I found this troublesome
> >> remark concerning threads:
> >>
> >> /* NOTE: rrd_update_r are only thread-safe if no at-style time
> >>     specifications get used!!! */
> >>
> >> What exactly does this mean?
> >>
> >> If I want to write a "rrdupdate-deamon" myself, it needs to run in
> >> threads-mode and it must use be able to use timestamps!
> >>
> >>
> >> Does this mean that this is completely impossible, or are there ways
> >> around this.
> >>
> >> If I would add a piece of code that implements a MUTEX based on the
> >> file-name of the rrd-file being updated, would this be enough to support
> >> rrd-updates with timestamp?
> >>
> >>
> >>
> >> Cheerio! Kr. Bonne.
> >>
> >> _______________________________________________
> >> mrtg mailing list
> >> mrtg at lists.oetiker.ch
> >> https://lists.oetiker.ch/cgi-bin/listinfo/mrtg
> 
> _______________________________________________
> mrtg mailing list
> mrtg at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/mrtg