[mrtg] Parsing HTML GUI

Mick michaelkintzios at gmail.com
Mon Sep 1 19:11:57 CEST 2008

On Wednesday 27 August 2008, McDonald, Dan wrote:
> >(First message to the list!)
> >
> >My modem (a 2WIRE 1800HG used in fully-bridged mode) does not offer telnet
> > or SNMP access.  I can only get to it via http/s.  Unfortunately, I have
> > no knowledge of perl, to be able to hack a script for this purpose.
> I wrote up something like that a long time ago.  I used LWP.pm as the main
> parsing engine.  Unfortunately, that was several jobs ago, so I don't
> believe I have the source.

Thank you Dan, I have hacked something that looks as if it works, although it 
probably is a terrible kludge.  Improvements gratefully received:
#!/usr/bin/perl -w

# Create a user agent object
use LWP::UserAgent;
use strict;

my $ua = LWP::UserAgent->new( );

my $url1 = '';
my $url2 
= '';

my $response = $ua->get( $url1 );

use HTTP::Cookies;
 $ua->cookie_jar( HTTP::Cookies->new(
 'file' => '/tmp/cookies.lwp',
 # where to read/write cookies
 'autosave' => 1,
 # save it to disk when done

my $req = $ua->get( $url2 );  #I used request here, because $response errors

if ($req->is_success) {
        print $req->content;  #I guess I need to save this to a file?
else {
        die $response->status_line, "\n";

> But LWP is a fairly simple module to use in perl, so I would recommend that
> you take this as a great opportunity to be introduced to perl.

 . . . talking about a crash course in Perl!  :p

> >Happy to post the HTML page with the stats if needed.
> Not until you get to the point that you need help with the regex.

Hmm, I probably need a crash course in regex too.  ;-)
The data is all in tables in the html page.  The titles are shown as:
 <td class="columnheaderborder">Rate</td>
 <td class="columnheaderborder">Max1</td>
 <td class="columnheaderborder">Max2</td>
 <td class="columnheaderborder">Max3</td>
 <td class="columnheaderborder">Mgn1</td>
 <td class="columnheaderborder">Mgn2</td>
 <td class="columnheaderborder">Attn</td>
 <td class="columnheaderborder">Pwr</td>
 <td class="columnheaderborder">CRCs</td>
 <td class="columnheaderborder">FECs</td>

but the values I am interested in are further down the page like so:
 <td nowrap="nowrap">+000 days 13:48:59</td>
 <td nowrap="nowrap">1</td>
 <td nowrap="nowrap">7616</td>
 <td nowrap="nowrap">7616</td>
 <td nowrap="nowrap">7040</td>
 <td nowrap="nowrap">7040</td>
 <td nowrap="nowrap">6.0</td>
 <td nowrap="nowrap">3.0</td>
 <td nowrap="nowrap">38.0</td>
 <td nowrap="nowrap">20.2</td>
 <td nowrap="nowrap">5694</td>
 <td nowrap="nowrap">583127</td>

I would like to capture and graph:

Max1 Vs Max2
Mgn1 Vs Mgn2

CRCs (1)
FECs (2)

(1) & (2) above are not meaningful as plain numbers.  Is it possible to 
calculate and graph a 15 min rolling average?

Then I would also like to capture in a table the most recent:


How should I go about regex-ing these out from the parsed page?

Please let me know if you need more info.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://lists.oetiker.ch/pipermail/mrtg/attachments/20080901/9daf1284/attachment.bin 

More information about the mrtg mailing list