[mrtg] Parsing HTML GUI

Mon Sep 1 19:11:57 CEST 2008

On Wednesday 27 August 2008, McDonald, Dan wrote:
> >(First message to the list!)
> >
> >My modem (a 2WIRE 1800HG used in fully-bridged mode) does not offer telnet
> > or SNMP access.  I can only get to it via http/s.  Unfortunately, I have
> > no knowledge of perl, to be able to hack a script for this purpose.
>
> I wrote up something like that a long time ago.  I used LWP.pm as the main
> parsing engine.  Unfortunately, that was several jobs ago, so I don't
> believe I have the source.

Thank you Dan, I have hacked something that looks as if it works, although it 
probably is a terrible kludge.  Improvements gratefully received:
==============================================
#!/usr/bin/perl -w

# Create a user agent object
use LWP::UserAgent;
use strict;

my $ua = LWP::UserAgent->new( );

my $url1 = 'http://10.10.10.25/xslt?PAGE=J01&THISPAGE=J46&NEXTPAGE=J01';
my $url2 
= 'http://10.10.10.25/xslt?PAGE=A02_POST&THISPAGE=A02_POST&NEXTPAGE=J42&CMSKICK=&NEXTPAGE=J42&THISPAGE=A02_POST&PAGE=J42&PASSWORD=XXXXX';

my $response = $ua->get( $url1 );

use HTTP::Cookies;
 $ua->cookie_jar( HTTP::Cookies->new(
 'file' => '/tmp/cookies.lwp',
 # where to read/write cookies
 'autosave' => 1,
 # save it to disk when done
));

my $req = $ua->get( $url2 );  #I used request here, because $response errors

if ($req->is_success) {
        print $req->content;  #I guess I need to save this to a file?
}
else {
        die $response->status_line, "\n";
}
==============================================

> But LWP is a fairly simple module to use in perl, so I would recommend that
> you take this as a great opportunity to be introduced to perl.

 . . . talking about a crash course in Perl!  :p

> >Happy to post the HTML page with the stats if needed.
>
> Not until you get to the point that you need help with the regex.

Hmm, I probably need a crash course in regex too.  ;-)
The data is all in tables in the html page.  The titles are shown as:
========================================
 <td></td>
 <td class="columnheaderborder">Rate</td>
 <td class="columnheaderborder">Max1</td>
 <td class="columnheaderborder">Max2</td>
 <td class="columnheaderborder">Max3</td>
 <td class="columnheaderborder">Mgn1</td>
 <td class="columnheaderborder">Mgn2</td>
 <td class="columnheaderborder">Attn</td>
 <td class="columnheaderborder">Pwr</td>
 <td class="columnheaderborder">CRCs</td>
 <td class="columnheaderborder">FECs</td>
 </tr>
 <tr>
========================================

but the values I am interested in are further down the page like so:
========================================
 <tr>
 <td nowrap="nowrap">+000 days 13:48:59</td>
 <td nowrap="nowrap">1</td>
 <td></td>
 <td nowrap="nowrap">7616</td>
 <td nowrap="nowrap">7616</td>
 <td nowrap="nowrap">7040</td>
 <td nowrap="nowrap">7040</td>
 <td nowrap="nowrap">6.0</td>
 <td nowrap="nowrap">3.0</td>
 <td nowrap="nowrap">38.0</td>
 <td nowrap="nowrap">20.2</td>
 <td nowrap="nowrap">5694</td>
 <td nowrap="nowrap">583127</td>
 <td></td>
 </tr>
========================================

I would like to capture and graph:

Max1 Vs Max2
Mgn1 Vs Mgn2

CRCs (1)
FECs (2)

(1) & (2) above are not meaningful as plain numbers.  Is it possible to 
calculate and graph a 15 min rolling average?

Then I would also like to capture in a table the most recent:

Uptime
Rate
Attn

How should I go about regex-ing these out from the parsed page?

Please let me know if you need more info.
-- 
Regards,
Mick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://lists.oetiker.ch/pipermail/mrtg/attachments/20080901/9daf1284/attachment.bin