in reply to Re: Munging Rendered HTML While Preserving Formatting
in thread Munging Rendered HTML While Preserving Formatting

idsfa,
I am not sure I made the problem clear as your response at first glance isn't appropriate. The task is to change foo to bar in rendered HTML while keeping the original formatting. What it boils down to changing foo to bar in the underlying HTML (which may involve imbedded tags) so that the rendered HTML looks like you did s/foo/bar/g

Cheers - L~R

  • Comment on Re^2: Munging Rendered HTML While Preserving Formatting

Replies are listed 'Best First'.
Re^3: Munging Rendered HTML While Preserving Formatting
by idsfa (Vicar) on Jun 28, 2004 at 16:52 UTC

    In general, I don't think you can get there from here. Consider the transform  s/foo/fishstick/g. How do you transform the HTML fo<b>o</b>?

    Assuming you constrain the replacement to have the same length as the original, something like this would do the job:

    use Regexp::Common; while( $html =~ s/($RE{balanced}{-parens=>'<>'})// ) { $tags{$-[0]} .= $1; } $html =~ s/foo/bar/g; foreach my $point (sort {$b<=>$a} keys (%tags)) { substr($html, $point, 0 ) = $tags{$point}; }

    For the pathological case of a tag with an attribute containing a '>' -- at this point you know as well as I do that you're into a full HTML parser:

    use HTML::Parser; # Remove the s///g from this one to leave tags alone # Alternately, specify additional methods to alter only # specific token types sub tagpush {$_ = shift; s/foo/bar/g; $tags{length($html)} .= $_ ;} sub txtpush { $html .= "@_"; } my $p = HTML::Parser->new(unbroken_text => 1, text_h => [ \&txtpush, "text" ], default_h => [ \&tagpush, "text" ], ); my $file = shift || usage(); $p->parse_file($file) || die "Can't open file $file: $!\n"; $html =~ s/foo/bar/g; foreach my $point (sort {$b<=>$a} keys (%tags)) { substr($html, $point, 0 ) = $tags{$point}; }

    This last once handles cases like f<!-- -->oo, f<b>oo</b> and <img alt=">foo"> properly as well, which a token parser will not catch.


    If anyone needs me I'll be in the Angry Dome.
      idsfa,
      Ok, so now you see what I was getting at. I don't have any experience with HTML munging so I don't know what one should do in these cases - that's why I asked. It is obviously a hard problem but I would think someone was working on it. I guess I will crawl back under my rawk now but thanks for the additional insight.

      Cheers - L~R