Grabing a Page and Need to Parse it..

LostS has asked for the wisdom of the Perl Monks concerning the following question:

Hey I asked a question the other day and got that working... Now I am finding another section I need to parse out of a page I am grabing via LWP::Simple. I have the page set to a variable $webpage Now I need to drop this section of code from that variable:

<!-- Begin MRTG Block -->
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>
  <TR>
    <TD WIDTH=63><A
    HREF="http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/"><IMG
    BORDER=0 SRC="http://hamburg.harbinger.net/mrtg/cen/mrtg-l.png" WI
+DTH=63 HEIGHT=25 ALT="MRTG"></A></TD>
    <TD WIDTH=25><A
    HREF="http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/"><IMG
    BORDER=0 SRC="http://hamburg.harbinger.net/mrtg/cen/mrtg-m.png" WI
+DTH=25 HEIGHT=25 ALT=""></A></TD>
    <TD WIDTH=388><A
    HREF="http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/"><IMG
    BORDER=0 SRC="http://hamburg.harbinger.net/mrtg/cen/mrtg-r.png" WI
+DTH=388 HEIGHT=25
    ALT="Multi Router Traffic Grapher"></A></TD>
  </TR>
</TABLE>
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>
  <TR VALIGN=top>
  <TD WIDTH=88 ALIGN=RIGHT><FONT FACE="Arial,Helvetica" SIZE=2>
version 2.9.6</FONT></TD>
  <TD WIDTH=388 ALIGN=RIGHT><FONT FACE="Arial,Helvetica" SIZE=2>
  <A HREF="http://ee-staff.ethz.ch/~oetiker/">Tobias Oetiker</A>
  <A HREF="mailto:oetiker@ee.ethz.ch">&lt;oetiker@ee.ethz.ch&gt;</A> 
and  &nbsp;
  <A HREF="http://www.bungi.com/">Dave&nbsp;Rand</A>&nbsp;
  <A HREF="mailto:dlr@bungi.com">&lt;dlr@bungi.com&gt;</A></FONT>
  </TD>
</TR>
</TABLE>
<!-- End MRTG Block -->
[download]

How would you suggest I do this?? I am trying to just get rid of that part so I will be replacing it with nothing.

Comment on Grabing a Page and Need to Parse it.. Download Code

Replies are listed 'Best First'.
Re: Grabing a Page and Need to Parse it.. by larsen (Parson) on May 07, 2001 at 20:12 UTC
General HTML parsing could be done via HTML::Parser and its relatives. Looking at your snippet of HTML, it seems that you will find HTML::TableExtract useful	[reply]
Re: Re: Grabing a Page and Need to Parse it.. by LostS (Friar) on May 07, 2001 at 21:04 UTC
I found it... `$traffictotals =~ s/<!-- Begin MRTG Block -->(.*?)<!-- End MRTG Block + -->//s;` [download] Works great :)	[reply] [d/l]
Re: (3) Grab Page and Parse it... (UCD-SNMP or Net::Snmp instead of parsing MRTG) by ybiC (Prior) on May 07, 2001 at 23:39 UTC
Looks like you're parsing output from Multi Router Traffic Grapher, a neat SNMP tool that fetches bandwidth utilization info and generates HTML pages from those stats. Instead of parsing it's HTML output back into text, you might consider using UCD SNMP to query the device(s) directly. UCD-SNMP includes snmpwalk and other command-line tools that are pretty slick. For a more perlish solution, Net::Snmp would also do the trick. "(code) mind your snmPs & Qs" shows yet another perlish approach, this time using CPAN module SNMP to query devices for info. There are any number of possible reasons why these wouldn't work in your situation, but they seem worth mentioning. cheers, Don striving toward Perl Adept (it's pronounced "why-bick")	[reply]
Re: Grabing a Page and Need to Parse it.. by swngnmonk (Pilgrim) on May 07, 2001 at 20:20 UTC
Will the block always be wrapped by those comments? Unless I misunderstand your question, a simple regexp will remove all of that. `$webpage =~ s/<!-- Begin MRTG Block -->.*<!-- End MRTG Block -->//os;` [download] Does the table occur more than once in the webpage? If so, add the 'g' (global - as often as it's encountered) option to the Regexp.	[reply] [d/l]
Re: Re: Grabing a Page and Need to Parse it.. by merlyn (Sage) on May 07, 2001 at 20:23 UTC
The /o there does nothing. And you'll probably need to make it .? instead of . to keep from grabbing too much. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Re: Grabing a Page and Need to Parse it.. by AidanLee (Chaplain) on May 07, 2001 at 20:53 UTC
also keep in mind that '.' does not match newlines. so you may be looking for `[.\n]*?` instead. update: I should have known that I was missing something. It seemed most out of character for merlyn to miss something like that.	[reply] [d/l]
Re: Re: Re: Re: Grabing a Page and Need to Parse it.. by merlyn (Sage) on May 07, 2001 at 20:54 UTC
Re: Re: Re: Grabing a Page and Need to Parse it.. by swngnmonk (Pilgrim) on May 08, 2001 at 21:05 UTC
Ok, this is beyond the scope of the initial question, but I'm curious anyways - the /o caches the Regexp so it doesn't need to be re-compiled, correct? I'm not familiar with the innards of the interpreter, but in the event we returned to this RE, wouldn't that be an (albeit extremely minimal) optimization?	[reply]
Re: Re: Re: Re: Grabing a Page and Need to Parse it.. by merlyn (Sage) on May 08, 2001 at 21:08 UTC