[smokeping-users] Smokeping only keeping ~8 weeks of data

Gregory Sloop gregs at sloop.net
Thu Apr 23 18:27:19 CEST 2020


It's your call on how much full res data you keep. [I keep two weeks of one minute full-res data.] You could keep as little as 144 minutes of full res data (provided I'm thinking correctly) - because that's the maximum "step" you currently have. i.e. It needs 144 samples to make your third tier of data, so it needs 144 full-res samples to "make" each row. That would be, IMO, pretty nuts, but it would work. But storage is way cheap, even SSD (unless you're keeping a massive amount of data on a really massive set of targets.)

The main reason to keep more is you get better visibility for longer.

For example.
In two samples (back to back), packet loss is 100%. In the others, it's zero.

But by the time you get around to looking at it, say the day after those samples were taken [24 hours, say] it's averaged those into 12m samples. And, if you're particularly unlucky, each of the 100% loss samples got averaged into two different rows. So, 11 rows of 0% loss and 1 @ 100% - so it's an average of ~8% loss - in two back-to-back rows of second tier data.

So, if you try to figure out what went wrong, it's going to be really hard to find the "signal" where that 2m of 100% loss was. [Your graph will show two pixels with 5-10% loss in the middle of a sea of green. To wit: You'll never see it.]

So, for me at least, keeping the full res data long enough to have diagnostic value is important. I may not realize something bad was going on for a while - days, perhaps even a week or two. If I start averaging that data out, the "signal" usually gets fainter - and I'm often obtuse enough to need all the help I can get. :)

---
A restart of SP will do the trick. But as noted, any existing data that's in the RRD from the past won't make any sense if you change the number of rows or steps and start capturing new data. [I believe it will restructure the RRD "automagically" but it will simply leave the data in the RRD as is. Though, don't gamble that I'm recalling that right. There's mention of how that's handled in the docs/list-postings - I'm too lazy to go dig it up.]

HTH

-Greg


Hi Greg,

Thank you very much. This explains a lot. I don't intend to change the step time for now. 

I'll do the math as advised 

However, do I need to change the number of minutes for the full res data which is currently 1008?

In addition, are there any additional actions I need to perform asides restarting the smokeping service for the changes to take effect?

Many Thanks,
Debo.


From: smokeping-users <smokeping-users-bounces+otubushin=hotmail.com at lists.oetiker.ch> on behalf of Gregory Sloop <gregs at sloop.net>
Sent: 23 April 2020 07:22
To: smokeping-users at lists.oetiker.ch <smokeping-users at lists.oetiker.ch>
Subject: Re: [smokeping-users] Smokeping only keeping ~8 weeks of data 
 
*** Database ***
step     = 60
pings    = 60

# consfn mrhb steps total
AVERAGE  0.5   1  1008
AVERAGE  0.5  12  4320
    MIN  0.5  12  4320
    MAX  0.5  12  4320
AVERAGE  0.5 144   720
    MAX  0.5 144   720
    MIN  0.5 144   720

---
Your database section isn't configured to save data any older than 72 days.

1008 minutes of full res data [Just under 17 hours - that's kind of an odd number, but whatever.]
36 days of 12m data
72 days of 144m data

If you want to have more data, you're going to have to change the database section to keep more rows.

Your step is 60s - so 1008 samples of 60s data.
Then 4320 rows averaging 12m of data. (4320*12=Total minutes of data. That comes out to 36 days worth.]
Then 720 rows of 144m averaged data. [720*144=Total minutes of data. That comes out to 72 days.]

I'll leave it as an exercise to figure out how many rows you'll need to keep of each sample type to have history long enough to cover the time-frame you want.

But, for example. the 144m samples - if you want 2 years of data [365*2] - you'd need
365 days * 2 = 2 years or 730 days
730 days * 24 hours = 17520 hours.
17520h * 60minutes = 1,051,200 minutes in 2 years.
Now divide those minutes by 144, and that's how many rows you'll need, if you want two years of 144m averaged data.
(i.e. 7300)
***That all assumes you continue to use 60s step times. If you change the step, you'll have to recalculate given the new step size.
If you change your step, you'll either have to re-sample your data, or more likely, dump the old data and let it rebuild the RRD files again with the proper structure and lose the historical data.

HTH

-Greg




Hi All,

I am having an issue with smokeping not displaying any graphs older than ~8 weeks. I have it configured to display graphical data for 3hrs, 30hrs, 10days and 400days.
So far, it does not show anything before the 7th week(in February 2020).

Is there anything I have configured wrongly here?



*** General ***

owner    = ######################################
contact  = ######################################
#mailhost = my.mail.host
#sendmail = /sbin/sendmail
# NOTE: do not put the Image Cache below cgi-bin
# since all files under cgi-bin will be executed ... this is not
# good for images.
imgcache = /opt/smokeping/cache
imgurl   = cache
datadir  = /opt/smokeping/data
piddir  = /opt/smokeping/var
cgiurl   = http://10.20.1.41/smokeping.fcgi
smokemail = /opt/smokeping/etc/smokemail.dist
tmail = /opt/smokeping/etc/tmail.dist
# specify this to get syslog logging
syslogfacility = local0
# each probe is now run in its own process
# disable this to revert to the old behaviour
# concurrentprobes = no

*** Alerts ***
to = ######################################
from = ######################################
+someloss
type = loss
# in percent
pattern = >0%,*12*,>0%,*12*,>0%
comment = loss 3 times  in a row


+bigloss
type = loss
# in percent
pattern = ==0%,==0%,==0%,==0%,>0%,>0%,>0%
comment = suddenly there is packet loss

+startloss
type = loss
# in percent
pattern = ==S,>0%,>0%,>0%
comment = loss at startup

+rttdetect
type = rtt
# in milli seconds
pattern = <10,<10,<10,<10,<10,<100,>100,>100,>100
comment = routing messed up again ?

+hostdown
type = loss
# in percent
pattern = ==0%,==0%,==0%, ==U
comment = no reply

+lossdetect
type = loss
# in percent
pattern = ==0%,==0%,==0%,==0%,>20%,>20%,>20%
comment = suddenly there is packet loss


*** Database ***

step     = 60
pings    = 60

# consfn mrhb steps total

AVERAGE  0.5   1  1008
AVERAGE  0.5  12  4320
    MIN  0.5  12  4320
    MAX  0.5  12  4320
AVERAGE  0.5 144   720
    MAX  0.5 144   720
    MIN  0.5 144   720

*** Presentation ***

template = /opt/smokeping/etc/basepage.html.dist

+ charts

menu = Charts
title = The most interesting destinations

++ stddev
sorter = StdDev(entries=>4)
title = Top Standard Deviation
menu = Std Deviation
format = Standard Deviation %f

++ max
sorter = Max(entries=>5)
title = Top Max Roundtrip Time
menu = by Max
format = Max Roundtrip Time %f seconds

++ loss
sorter = Loss(entries=>5)
title = Top Packet Loss
menu = Loss
format = Packets Lost %f

++ median
sorter = Median(entries=>5)
title = Top Median Roundtrip Time
menu = by Median
format = Median RTT %f seconds

+ overview

width = 600
height = 50
range = 10h

+ detail

width = 600
height = 200
unison_tolerance = 2

"Last 3 Hours"    3h
"Last 30 Hours"   30h
"Last 10 Days"    10d
"Last 400 Days"   400d

#+ hierarchies
#++ owner
#title = Host Owner
#++ location
#title = Location

*** Probes ***

+ FPing

binary = /usr/sbin/fping

*** Slaves ***
secrets=/opt/smokeping/etc/smokeping_secrets.dist
+boomer
display_name=boomer
color=0000ff

+slave2
display_name=another
color=00ff00

*** Targets ***

probe = FPing

Many Thanks,
Debo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.oetiker.ch/pipermail/smokeping-users/attachments/20200423/4abc05a1/attachment-0001.html>


More information about the smokeping-users mailing list