[smokeping-users] Smokeping duplicate and possibly false packet loss bug

Chris Wilson chris at aidworld.org
Mon Feb 12 12:45:56 CET 2007


Hi Vinny,

> I'm observing what I believe is a bug, not only on my installation, but 
> even on the demo page for Smokeping.
>
> When I see a 1/20 loss, it ALWAYS shows up twice in two consecutive 5 
> minute intervals. The chance of loosing a packet in one five minute 
> period followed by another loss in the next 5 minute period every single 
> time is not likely. I thought this was just an issue on my network, but 
> NO other tools can detect any loss where Smokeping keeps showing this 
> loss randomly. The loss never happens at the same time on different 
> graphs going through the same network path. They are always random and 
> ALWAYS happen consecutively. I see the EXACT same pattern on the 
> Smokeping demo web page. I can't quantify the loss measurements that 
> smokeping is seeing with anything else and can't believe there is as 
> much loss as it claims there is on various installations across 
> different networks I have seen.
>
> I'm looking for a way to prove this. I suspect it's some sort of bug 
> that fping is generating a value that cannot be input into the RRD 
> database so it ends up as a null value and shows as loss. I'm not sure 
> about the duplicate consecutive occurrences of it, but I believe it's 
> related.
>
> How can this be debugged, proven or disproven? I'm not an RRDTool guru 
> but am really perplexed by the data Smokeping is showing.

It probably happens because you are not inputting data on exactly a step 
boundary into the RRD database, so rrdtool interpolates your record on 
both sides. Therefore instead of loss: 0.0, 1.0, 0.0, you get something 
like loss: 0.0, 0.7, 0.3 (in three consecutive records).

You can check this by running "rrdtool dump" on your rrd database and 
closely inspecting the rows corresponding to the times where you see this 
happen on your graph. The second value in each row is the loss, the third 
and subsequent are the individual ping times. But you will probably be 
looking at an archive row rather than a primary data point, so you will 
not be seeing individual values but a summary across multiple rows.

> I think this has only happened in more recent versions of Smokeping. 
> When I first started running it, I never saw these patterns. Either an 
> updated version of Smokeping is causing this or possibly fping, but I 
> have also seen the same pattern with two consecutive 1/20 loss values on 
> other monitor types as well.

Perhaps more recent versions have changed the criteria for displaying the 
blue colour for slight loss.

Cheers, Chris.
-- 
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.



More information about the smokeping-users mailing list