HTTP-404 has asked for the wisdom of the Perl Monks concerning the following question:

Hello again I have a page where i want to ripout all content between few lines
JUNK HTML .............. <td class="ContenidoTitulo" colspan="3">Trucos de A 10 CUBA para PC < +/td> ................ MORE NEEDED HTML ................ <td class="ContenidoTexto" colspan="3"> CODIGOS<br>Some Text </td> ............. JUNK HTML
I have followng code for cutting it this needed part
#!/usr/bin/perl # Create a user agent object use LWP::UserAgent; use LWP::Simple; &surf; sub surf{ $ua = new LWP::UserAgent; $ua->agent('Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 5. +12  [en]' . $ua->agent); $ua->agent('Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 5. +12  [en]'); $req = HTTP::Request->new(GET => 'http://www.chollonet.com/trucoteca/ +truco.php?idjuego=598&plataforma=4'); #get rid if html @html=$ua->request($req)->as_string; print "@html"; }
I think that following regexo should be fine
/^<td class="ContenidoTitulo" colspan="3">Trucos de A 10 CUBA para P +C </td>(*.)<td class="ContenidoTexto" colspan="3"> CODIGOS<br>Some Te +xt </td>
but how would i use it my script thnx a lot

Replies are listed 'Best First'.
Re: Cutting big HTML file
by !me (Acolyte) on Jul 29, 2001 at 05:03 UTC
    Here is my non-regex solution. It's long, ugly and not very efficient but it works!
    my ($page) = "12345abc67890"; print &get_page_chunk('12345','67890',$page); sub get_page_chunk { my ($start_marker,$end_marker,$page) = @_; my ($x1,$x2) = -1; $x1 = index($page,$start_marker); if ($x1 != -1) { $x1 += length($start_marker); $x2 = index($page,$end_marker,$x1); if ($x2 != -1 && $x2 > $x1) { return(substr($page,$x1,$x2-$x1)); }; }; return (''); };
Re: Cutting big HTML file
by andye (Curate) on Jul 29, 2001 at 01:59 UTC
    Hi again, I'd do something like this...
    use LWP::Simple; my $html = get('http://www.chollonet.com/trucoteca/truco.php?idjue +go=598&plataforma=4'); $html =~ /<td class="ContenidoTitulo" colspan="3">Trucos de A 10 +CUBA para PC </td>(*.)<td class="ContenidoTexto" colspan="3"> CODIGOS +<br>Some Text </td>/s ; print $1;

    Notice that I've changed your regexp a little, at the beginning and at the end.

    I hope I've helped. andy.

Re: Cutting big HTML file
by mitd (Curate) on Jul 29, 2001 at 11:46 UTC
    Since your target data appears to all be contained within tables you may find HTML::TableExtract to be very useful.

    mitd-Made in the Dark
    'My favourite colour appears to be grey.'

Re: Cutting big HTML file
by earthboundmisfit (Chaplain) on Jul 29, 2001 at 02:54 UTC
    Not much to add except, have you taken a look at HTML::Parser?