in reply to HTML document modification

Couldn't you just use simple substitution, for such a minimal task? I mean I am the first (or one of them) to decry using a regexp for HTML, but this situation may not warrant more.

my $newstuff = "<p>New HTML here!</p>\n"; open my $in, '<', "infile.html" or die $!; open my $out, '>', "tempfile.html" or die $!; while ( my $line = <$in> ) { next unless $line =~ m!<\s*/body\s*>!i; $line =~ s!(<\s*/body\s*>)!$newstuff$1!i; } continue { print $out; } close $out or die $!; close $in or die $!; rename "tempfile.html", "infile.html" or die $!;

...untested, but it seems about right...


Dave

Replies are listed 'Best First'.
Re: Re: HTML document modification
by rob_au (Abbot) on May 28, 2004 at 02:39 UTC
    The issue with this approach is where the HTML document may include example HTML, including <body></body> tags, within <pre></pre> tags. The regular expression which BrowserUK has provided appears to be somewhat more robust, although I suspect that I will follow his suggestion to try reading the file backwards for the first </body> tag.

     

    perl -le "print unpack'N', pack'B32', '00000000000000000000001011011110'"

      Uh, that would be invalid HTML, wouldn't it? Any examples within the document would have to be escaped.
        Most browsers correctly render invalid html anyways... (Not that I recommend sending browsers invalid html, since it increases parse time, but it still works...)


        ----
        Zak - the office