(jeffa) Re: 'one-liner' help

Well, as you can see from the reponses so far (caveat: i could be corrected ...) that this kind of HTML parsing is tough when your tool is a regex. Why not use a parser instead? I know it's not what you want, but here is some code that uses HTML::TokeParser::Simple to extract just the 'Content' <div> section. Isn't that what you are really trying to do - extract that <div> and everything it contains?

use strict;
use warnings;
use HTML::TokeParser::Simple;

my $parser = HTML::TokeParser::Simple->new('file.html');

my $print = 0;     # so we'll know when to start printing
my $count = 0;     # need a 'stack' to keep track of div tags

while (my $token = $parser->get_token()) {

   if ($token->is_start_tag('div')) { 
      $print = 1 if $token->return_attr()->{class} eq 'Content';
      $count++;
   }

   print $token->as_is() if $print;

   if ($token->is_end_tag('div')) { 
      $count--;
      last if $count == 0 and $print == 1;
   }
}
[download]

If you want to use this to modify some HTML files, i am afraid that you will have to save copies instead of doing in-place editing. I recommend saving the new files in a seperate directory, then you can just move the lot up a level and clobber the originals. ;)

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Comment on (jeffa) Re: 'one-liner' help Download Code