in reply to Re: Re: Re: Re: HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser?
in thread HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser?
Assuming that you mean only in the extracted snippet, and not in the rest of the original document, then I might do something like this:
#! perl -slw use strict; use LWP::Simple; $_ = get( 'http://the.site.com/path' ); if( m[(<table.*?</table>)]si ) { ($_ = $1) =~ s[(href|src)\s*=\s*"([^"]+)"] #" [$1="http://the.site.com$2"]sig; print; exit; }
Throw that in a script and redirect the ouput to your file.
But please note. This is fragile! Anything changes on the web page and you have to change the regex. Many types of change would not be accomadatable easily. Given the examples of doing it 'the right way' offered above, your probably better off using one of those as your starting point.
|
|---|