[mrtg] Parsing HTML GUI

Tue Sep 2 14:32:54 CEST 2008

On Mon, 2008-09-01 at 18:11 +0100, Mick wrote:
> On Wednesday 27 August 2008, McDonald, Dan wrote:
> > >(First message to the list!)
> > >
> > >My modem (a 2WIRE 1800HG used in fully-bridged mode) does not offer telnet
> > > or SNMP access.  I can only get to it via http/s.  Unfortunately, I have
> > > no knowledge of perl, to be able to hack a script for this purpose.
> >
> > I wrote up something like that a long time ago.  I used LWP.pm as the main
> > parsing engine.  Unfortunately, that was several jobs ago, so I don't
> > believe I have the source.
> 
> Thank you Dan, I have hacked something that looks as if it works, although it 
> probably is a terrible kludge. 

Actually, it looks pretty good.  Now we just need to turn the goo your
modem spits out into something useful:

> > But LWP is a fairly simple module to use in perl, so I would recommend that
> > you take this as a great opportunity to be introduced to perl.
> 
>  . . . talking about a crash course in Perl!  :p

But you won't be nearly as afraid of perl next time...

> > >Happy to post the HTML page with the stats if needed.
> >
> > Not until you get to the point that you need help with the regex.
> 
> Hmm, I probably need a crash course in regex too.  ;-)
> The data is all in tables in the html page.  The titles are shown as:
> ========================================
>  <td></td>
>  <td class="columnheaderborder">Rate</td>
>  <td class="columnheaderborder">Max1</td>
>  <td class="columnheaderborder">Max2</td>
>  <td class="columnheaderborder">Max3</td>
>  <td class="columnheaderborder">Mgn1</td>
>  <td class="columnheaderborder">Mgn2</td>
>  <td class="columnheaderborder">Attn</td>
>  <td class="columnheaderborder">Pwr</td>
>  <td class="columnheaderborder">CRCs</td>
>  <td class="columnheaderborder">FECs</td>
>  </tr>
>  <tr>
> ========================================

> but the values I am interested in are further down the page like so:
> ========================================
>  <tr>
>  <td nowrap="nowrap">+000 days 13:48:59</td>
>  <td nowrap="nowrap">1</td>
>  <td></td>
>  <td nowrap="nowrap">7616</td>
>  <td nowrap="nowrap">7616</td>
>  <td nowrap="nowrap">7040</td>
>  <td nowrap="nowrap">7040</td>
>  <td nowrap="nowrap">6.0</td>
>  <td nowrap="nowrap">3.0</td>
>  <td nowrap="nowrap">38.0</td>
>  <td nowrap="nowrap">20.2</td>
>  <td nowrap="nowrap">5694</td>
>  <td nowrap="nowrap">583127</td>
>  <td></td>
>  </tr>
> ========================================

# OK, let's convert everything into an array, split on </td> boundaries

my @data = split(qw(</td>),$req->content);

# Then see if we can parse this into something somewhat usable

my ($index, at header,%content);
$index=0;
foreach my $datum (@data) {
   if ( $datum =~ /columnheaderborder/) {
	my ($head) = ($datum =~ />(.+?)$/);
	push @header,$head;
   }
   if ($datum =~ /nowrap="nowrap"/) {
	my ($value) = ($datum =~ />(.+?)$/);
	$content{$header[$index]} = $value;
	$index++;
  }
}

# assuming there weren't any extraneous headers, 
# we should now have a hash indexed by the headers:

> I would like to capture and graph:
> 
> Max1 Vs Max2

print "$content{'Max1'}\n$content{'Max2'}\n\n\n";

> Mgn1 Vs Mgn2

print "$content{'Mgn1'}\n$content{'Mgn2'}\n\n\n";

-- 
Daniel J McDonald, CCIE #2495, CISSP #78281, CNX
Austin Energy
http://www.austinenergy.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://lists.oetiker.ch/pipermail/mrtg/attachments/20080902/519578db/attachment.bin