[rrd-users] Re: retroactively graphing data from logfile (newbie)

Ryan Tracey ryan.tracey at gmail.com
Wed May 25 10:47:49 MEST 2005


Hiya

Thanks for your reply.

On 5/24/05, Alex van den Bogaerdt <alex at ergens.op.het.net> wrote:
> On Tue, May 24, 2005 at 01:54:27PM +0200, Ryan Tracey wrote:
> 
> > 2005-05-03-12 102518 16 119 278 406 27 9 1145 0
> > 2005-05-03-13 127723 22 273 426 488 29 2 1264 0
> >
> > Where the first column is the time and hour.   The columns that follow
> > contain the occurences of a given type in that hour.
> >
> > I created the following rrd.
> >
> > rrdtool create hugelog.rrd \
> > --start 1114833600 \
> 
> Sat Apr 30 06:00:00 2005 METDST

That is correct.  I worked out the epoch time from the date in the
logs. Much like this:

>>> t = 'Sat Apr 30 06:00:00 2005'
>>> import time
>>> time.strptime(t, '%a %b %d %H:%M:%S %Y')
(2005, 4, 30, 6, 0, 0, 5, 120, -1)
>>> time.mktime(time.strptime(t, '%a %b %d %H:%M:%S %Y'))
1114833600.0

 
> > --step 3600 \
> 
> Each hour.  So far, so good.  Updates after apr 30 will be allowed.
> 
> > DS:sync_error:GAUGE:3600:0:100000 \
> > DS:zipfile:GAUGE:3600:0:100000 \
> > DS:malware:GAUGE:3600:0:100000 \
> 
> Accept positive rates upto 100000.  Guess that's OK as well.
> Heartbeat is 3600:  want updates at most 3600 seconds apart.
> 
> > RRA:AVERAGE:0.5:1:24
> 
> One "step" per row, 24 rows.  That's not huge, is it?  You have
> asked for a database storing exactly one day (24 rows, one hour each).

Hmm, I thought I was creating the facility to create daily averages. 
But I should rather do something like "RRA:AVERAGE:0.5:1:x" where x is
the number of hours in the amount of days I want to store. Say, 144
for 6 days.
 
Also, how would you best describe what the 0.5 does.  The beginners
guide seems to have glossed over that one.

> > rrdtool update hugelog.rrd -t \
> > sync_error:\
> > zipfile:\
> > malware \
> 
> I don't think "-t" to "malware" is necessary.  Just make sure
> you get the order right and skip that part.

Okay, thanks.   Leaving that out will make things neater.

> > 1115110805:67846:4:82:157:318:12:1:426:0
> 
> I guess you mean "1115110805" is variable __AND__ it is a whole
> multiple of 3600 ?

1115110805 is variable, the timestamp in the first column, but not (I
suspect) necessarily a multiple of 3600.  Should it be?

> You want to update every 3600 seconds, or sooner.  If one timestamp
> is exactly on the hour, and the next timestamp is 5 seconds late,
> you'll loose an update.

I'll change the step size to 1.5 hours  (5400).

> > Firstly, what am I doing wrong?  I will attach the rrd (hugelog.rrd:
> > not sure if the list will accept, though.)
> 
> Look at the size of "hugelog":
> 
> > Secondly, am I just introducing unneccessary complexity by
> > preprocessing the original log file and creating a per-hour file?
> 
> Preprocessing?  RRDtool could probably do it for you however you
> didn't specify what you did so I can only guess.

I grouped log events into the hour in which they occurred into one
(smaller) logfile.  For instance:

2005-05-03-12 102518 16 119 278 406 27 9 1145 0

On 3rd May between 12h00 and 13h00 there were  102518 "synchronization
errors"  recorded in the logfile (Sober worm not speaking SMTP
properly). There were 16 occurences of remote mail servers (spamware)
using my IP address in their HELO/EHLO, etc.

The python script did something like:

for line in logfile:
   if line.find('synchronization error')>=0:
       sync+=1
   if line.find('bad helo')>=0:
      bad_helo+=1

I grouped the counts by hour. 

I guess I could have just done entered the data into an rrd at that
stage.   In that case, what would a reasonable heartbeat be?  There
were multiple log entries per second while sober was hitting our mail
server.   A 10 Second heartbeat?

> > -- Attached file removed by Ecartis and put at URL below --
> > -- Type: application/octet-stream
> > -- Size: 4k (4764 bytes)
> > -- URL : http://lists.ee.ethz.ch/p/hugelog.rrd
> 
> Only 4764 bytes.  That isn't huge, that isn't large, that isn't
> even average size.  It is _very_ small for an RRDtool file.

hugelog is the name of the original concatenated mail log.  2 GB --
not too huge as mail logs go but it was late ;-)

I'll re-create the rrd based on your advice.   Thanks again for the help.   

Regards,
Ryan

-- 
Ryan Tracey
Citizen: The World

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://lists.ee.ethz.ch/rrd-users
WebAdmin    http://lists.ee.ethz.ch/lsg2.cgi



More information about the rrd-users mailing list