Now sometimes I can find the links I want on an HTML page just by matching a URL pattern. This method is amenable to parsing with Toke::Parser or similar.
But say a site uses a completely opaque URL format like "?storyid=123456" for everything?
What I've done in the past is to find the chunk of the page which contains those "good" links as a way to exclude the "bad" ones. And I've done it the "dumb" way, i.e.
and then working on the $good_chunk.$whole_thing =~ m|<some unique html start string>(.*?)<end string>|s; $good_chunk = $1;
I've spent a bit of time looking at Toke::Parser and HTML::Parser and I can't seem to figure out how to do the equivalent.
Say I've determined that what I need is
up to the closing tag of that DIV.<div id="good_chunk">
I need something like
Perhaps I'm missing something obvious?while ( my $token = $p->get_tag( "div" ) ) { if ( $token->[1]->{'id'} eq 'good_chunk' ){ # get the entire contents of the div, as HTML, # for further parsing } }
($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print
In reply to Use Parsers To Get Chunk of HTML? by Cody Pendant
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |