mahira has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

I have an issue and after several hours of "googling" and "cpanning" I was not able to solve it :(

My aim is to fetch a specific, single page html and copy the "exact/as-is" content of a specific table division (td) to a scalar variable.

The document has several tables. Nor my table, neither the others has any id. But the table division I am interested in has a specific class defined.

Thanks for your help...

ps: I am aware of LWP::Simple and currently using it for fetching the page to a file. But the rest :((

Replies are listed 'Best First'.
Re: Please help with a fetching issue
by marto (Cardinal) on Mar 24, 2010 at 15:22 UTC
Re: Please help with a fetching issue
by Your Mother (Archbishop) on Mar 24, 2010 at 19:01 UTC

    Either of these (tuned to your need) should do the trick. HTML::TokeParser::Simple or XML::LibXML.

    use LWP::Simple qw( get ); use HTML::TokeParser::Simple; my $page = get(+shift || die "Gimme URI!\n"); my $p = HTML::TokeParser::Simple->new(\$page); while ( my $token = $p->get_tag("td") ) { next unless $token->get_attr("class") =~ /\bsomeClass\b/; my $first_child = $p->get_token(); print $first_child->as_is, $/; last; } use XML::LibXML; my $p = XML::LibXML->new; $p->recover_silently(1); my $doc = $p->parse_html_string($page); my ( $td ) = $doc->findnodes('//td[@class="someClass"]'); print $td->textContent, $/;

    (update: rolled into single <code/> and fixed tag name.

      Thank you very much.

      I was not able to utilize the solutions above. I don't know why but I think it is something related with the page...

      At the end, I was able to fix the issue with some regex. But this time I used a tag right before the table division. After fetching the page:

      $page =~ s/(\n|\r)/<!--xxx-->/g; $page =~ s/.*<!--start\stag-->(.*)<!--end\stag-->.*/$1/; $page =~ s/<!--xxx-->/\n/g;

      Thanks again for your help.