Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:
Now sometimes I can find the links I want on an HTML page just by matching a URL pattern. This method is amenable to parsing with Toke::Parser or similar.
But say a site uses a completely opaque URL format like "?storyid=123456" for everything?
What I've done in the past is to find the chunk of the page which contains those "good" links as a way to exclude the "bad" ones. And I've done it the "dumb" way, i.e.
and then working on the $good_chunk.$whole_thing =~ m|<some unique html start string>(.*?)<end string>|s; $good_chunk = $1;
I've spent a bit of time looking at Toke::Parser and HTML::Parser and I can't seem to figure out how to do the equivalent.
Say I've determined that what I need is
up to the closing tag of that DIV.<div id="good_chunk">
I need something like
Perhaps I'm missing something obvious?while ( my $token = $p->get_tag( "div" ) ) { if ( $token->[1]->{'id'} eq 'good_chunk' ){ # get the entire contents of the div, as HTML, # for further parsing } }
($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Use Parsers To Get Chunk of HTML?
by merzy (Scribe) on Jul 04, 2005 at 04:22 UTC | |
by Cody Pendant (Prior) on Jul 04, 2005 at 04:33 UTC | |
|
Re: Use Parsers To Get Chunk of HTML?
by GrandFather (Saint) on Jul 04, 2005 at 04:17 UTC | |
|
Re: Use Parsers To Get Chunk of HTML?
by polettix (Vicar) on Jul 04, 2005 at 10:45 UTC | |
by Cody Pendant (Prior) on Jul 05, 2005 at 23:17 UTC |