[mrtg] Re: daemon robustness problem

Greg.Volk at edwardjones.com Greg.Volk at edwardjones.com
Fri May 3 23:02:01 MEST 2002


Well, if it helps I have compressed and attached the bad (?)
RRD file to this message as badrrd.zip. It appears to respond
to some rrdtool commmands without error, but gives errors for
others...


> rrdtool dump switch-r-core1a_port-channel10.rrd > test.xml
> ls -al
total 440
drwxr-xr-x   2 mrtg     users        4096 May  3 15:45 .
drwxrwxrwt  21 root     root        32768 May  3 15:38 ..
-rw-r--r--   1 mrtg     users       94660 May  3 08:50 switch-r-core1a_port-channel10.rrd
-rw-r--r--   1 mrtg     users      303701 May  3 15:45 test.xml
> rrdtool restore test.xml test.rrd
ERROR: unknown consolidation function 'RRD'
>
> rrdtool info switch-r-core1a_port-channel10.rrd
filename = "switch-r-core1a_port-channel10.rrd"
rrd_version = "0001"
step = 300
last_update = 1020066925
ds[ds0].type = "COUNTER"
ds[ds0].minimal_heartbeat = 600
ds[ds0].min = 0.0000000000e+00
ds[ds0].max = 1.2500000000e+07
ds[ds0].last_ds = "0"
ds[ds0].value = 0.0000000000e+00
ds[ds0].unknown_sec = 0
ds[ds1].type = "COUNTER"
ds[ds1].minimal_heartbeat = 600
ds[ds1].min = 0.0000000000e+00
ds[ds1].max = 1.2500000000e+07
ds[ds1].last_ds = "0"
ds[ds1].value = 0.0000000000e+00
ds[ds1].unknown_sec = 0
rra[0].cf = "RRD"
rra[0].rows = 2
rra[0].pdp_per_row = 8
rra[0].xff = 1.4821969375e-321
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[0].cdp_prep[1].value = NaN
rra[0].cdp_prep[1].unknown_datapoints = 0
rra[1].cf = ""
rra[1].rows = 0
rra[1].pdp_per_row = 1314213699
rra[1].xff = 2.6638537427e-317
rra[1].cdp_prep[0].value = 0.0000000000e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[1].cdp_prep[1].value = 0.0000000000e+00
rra[1].cdp_prep[1].unknown_datapoints = 0
rra[2].cf = ""
rra[2].rows = 0
rra[2].pdp_per_row = 0
rra[2].xff = 0.0000000000e+00
rra[2].cdp_prep[0].value = 0.0000000000e+00
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[2].cdp_prep[1].value = 0.0000000000e+00
rra[2].cdp_prep[1].unknown_datapoints = 0
rra[3].cf = ""
rra[3].rows = 0
rra[3].pdp_per_row = 0
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 0.0000000000e+00
rra[3].cdp_prep[0].unknown_datapoints = 0
rra[3].cdp_prep[1].value = 0.0000000000e+00
rra[3].cdp_prep[1].unknown_datapoints = 0
rra[4].cf = "MAX"
rra[4].rows = 600
rra[4].pdp_per_row = 1
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = NaN
rra[4].cdp_prep[0].unknown_datapoints = 0
rra[4].cdp_prep[1].value = NaN
rra[4].cdp_prep[1].unknown_datapoints = 0
rra[5].cf = "MAX"
rra[5].rows = 700
rra[5].pdp_per_row = 6
rra[5].xff = 5.0000000000e-01
rra[5].cdp_prep[0].value = 0.0000000000e+00
rra[5].cdp_prep[0].unknown_datapoints = 0
rra[5].cdp_prep[1].value = 0.0000000000e+00
rra[5].cdp_prep[1].unknown_datapoints = 0
rra[6].cf = "MAX"
rra[6].rows = 775
rra[6].pdp_per_row = 24
rra[6].xff = 5.0000000000e-01
rra[6].cdp_prep[0].value = 0.0000000000e+00
rra[6].cdp_prep[0].unknown_datapoints = 0
rra[6].cdp_prep[1].value = 0.0000000000e+00
rra[6].cdp_prep[1].unknown_datapoints = 0
rra[7].cf = "MAX"
rra[7].rows = 797
rra[7].pdp_per_row = 288
rra[7].xff = 5.0000000000e-01
rra[7].cdp_prep[0].value = 0.0000000000e+00
rra[7].cdp_prep[0].unknown_datapoints = 0
rra[7].cdp_prep[1].value = 0.0000000000e+00
rra[7].cdp_prep[1].unknown_datapoints = 0
>

When I compare 'info' data to that of a healthy, working
RRD file I see that the rra[n].cf values are usually "MAX" 
or "AVERAGE", but not "RRD."

If that is the issue, then I suppose the real question is
how did the rra[0].cf get set to RRD? Or maybe rra[n].cf
doesn't have anything to do with the errors - I'm not very
familiar with the actual RRD data structure to begin with.

 

Today oetiker at ee.ethz.ch wrote:
> 
> I have heard this from other folks, but I am totally lost as to
> what the reason for the strange state could be ...
> 
> 
> >
> > Every so often, I run into the following error...
> >
> > > mrtg switch-r-core1a.cfg
> > Daemonizing MRTG ...
> > > ERROR: Cannot update /home/mrtg/public_html/switch-r-core1a/switch-r-core1a_port-channel10.rrd with '1020433785:0:0' unknown consolidation function 'RRD'
> >
> > This causes the daemon to die, and all the targets in that particular
> > config file go un-updated until I realize that this has happened. To
> > fix it I end up deleting the RRD file that caused the error, delete
> > the PID file that was left behind due to the daemon's death, and then
> > relaunch the daemon.
> >
> > What causes this?
> >
> > Could it be bad hardware (disk?) errors on my system? The OS
> > hasn't ever complained about any hardware problems, and the
> > box has never crashed - it's at 168 days of uptime right now
> > - so I don't think any dirty-shutdowns are to blame.
> > Could I be looking at an RRDtool bug? I'm running v1.0.33. I
> > looked at the changelog since 1.0.33 and didn't see anything
> > mentioning this.
> >
> > Has anyone else run into this?
> >
> > Thanks
> >


-- Attached file removed by Listar and put at URL below --
-- Type: application/octet-stream
-- Size: 1k (1123 bytes)
-- URL : http://www.ee.ethz.ch/~slist/pantomime/badrrd.zip


--
Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive     http://www.ee.ethz.ch/~slist/mrtg
FAQ         http://faq.mrtg.org    Homepage     http://www.mrtg.org
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the mrtg mailing list