Re: HTML document modification

Couldn't you just use simple substitution, for such a minimal task? I mean I am the first (or one of them) to decry using a regexp for HTML, but this situation may not warrant more.

my $newstuff = "<p>New HTML here!</p>\n";
open my $in, '<', "infile.html" or die $!;
open my $out, '>', "tempfile.html" or die $!;
while ( my $line = <$in> ) {
    next unless $line =~ m!<\s*/body\s*>!i;
    $line =~ s!(<\s*/body\s*>)!$newstuff$1!i;
} continue {
    print $out;
}
close $out or die $!;
close $in or die $!;
rename "tempfile.html", "infile.html" or die $!;
[download]

...untested, but it seems about right...

Dave

Comment on Re: HTML document modification Download Code

Replies are listed 'Best First'.
Re: Re: HTML document modification by rob_au (Abbot) on May 28, 2004 at 02:39 UTC
The issue with this approach is where the HTML document may include example HTML, including `<body></body>` tags, within `<pre></pre>` tags. The regular expression which BrowserUK has provided appears to be somewhat more robust, although I suspect that I will follow his suggestion to try reading the file backwards for the first `</body>` tag. `perl -le "print unpack'N', pack'B32', '00000000000000000000001011011110'"`	[reply] [d/l] [select]
Re: Re: Re: HTML document modification by perrin (Chancellor) on May 28, 2004 at 04:00 UTC
Uh, that would be invalid HTML, wouldn't it? Any examples within the document would have to be escaped.	[reply]
Re: Re: Re: Re: HTML document modification by zakzebrowski (Curate) on May 28, 2004 at 13:51 UTC
Most browsers correctly render invalid html anyways... (Not that I recommend sending browsers invalid html, since it increases parse time, but it still works...) ---- Zak - the office	[reply]