Search and replace in html

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Search and replace in html by tall_man (Parson) on May 08, 2003 at 23:48 UTC
Parsing HTML with regular expressions is not recommended. You should look at HTML::Parser. Also take a look at this node from the FAQ's How do I remove HTML from a string?.	[reply]
Re: Search and replace in html by Limbic~Region (Chancellor) on May 09, 2003 at 00:06 UTC
Anonymous Monk, There are a plethora of modules on CPAN that could help you, I would suggest looking at the following search. Roll your own solutions with unknown data sources are likely to fail. With that said - let's assume your HTML is perfectly formatted and you want everything between the start and end HTML tags to include other HTML tags. `#!/usr/bin/perl -w use strict; open (INPUT,"file") or die "Unable to open input : $!"; open (OUTPUT,">output") or die "Unable to open outpu : $!"; select OUTPUT; $\ = "\n"; my $foundstart; while (<INPUT>) { chomp; next unless ($foundstart \|\| /<html >/i); if (/<html >/i && ! $foundstart) { $_ =~ s/^.?<html >(.)$/$1/i; $foundstart++; next unless($_); } if ($_ =~ m\|</html >\|i) { $_ =~ s\|^(.?)</html >.*$\|$1\|i; print if($_); last; } print; } close INPUT; close OUTPUT;` [download] Cheers - L~R	[reply] [d/l]
Re: Re: Search and replace in html by Anonymous Monk on May 09, 2003 at 01:16 UTC
Thanks guys, it has started to get me on my way. I have now found that the files contains multiple html references. Would it simply be a matter of just changing the if to a while.	[reply]
Re: Re: Re: Search and replace in html by Limbic~Region (Chancellor) on May 09, 2003 at 12:56 UTC
Anonymous Monk, No - you should not try to roll your own unless you are 100% sure of your data. That is what I was trying to point out. Follow tall_man's advice or find a module you like using the search I provided. Cheers - L~R	[reply]
Re: Search and replace in html by kilinrax (Deacon) on May 08, 2003 at 23:32 UTC
Sounds like you could want to set the input record separator ('`$/`') to undef, then you'll pull the file in one huge chunk rather than line-by-line. Try either of the following lines: `undef $/; local $/;` [download] The first line will set '`$/`' to undef for the rest of the file, the second only for the enclosing scope (arguably better). You might want to try reading perlvar to get a better idea what '`$/`' is, and maybe pick a better value to set it to.	[reply] [d/l]
Re: Search and replace in html by LameNerd (Hermit) on May 08, 2003 at 23:31 UTC
Do you want something like this? `#!/usr/bin/perl -w use strict; while(<DATA>) { next if /<HTML.?>/gi; next if /<\/HTML.?>/gi; print; } __DATA__ <HTML> <HEAD><TITLE>Homepage</TITLE></HEAD> <BODY> <a href='blah.html'> man blah.pl</a><BR> <a href='blah.html'> man blablablah.sh </a><BR> <a href='blah.html'> man blablablablah.sh </a><BR> </BODY> </HTML>` [download] update ... or maybe ... `#!/usr/bin/perl -w use strict; while(<DATA>) { s/<HTML.?>//gi; s/<\/HTML.?>//gi; print; } __DATA__ <HTML><HEAD><TITLE>Homepage</TITLE></HEAD> <BODY> <a href='blah.html'> man blah.pl</a><BR> <a href='blah.html'> man blablablah.sh </a><BR> <a href='blah.html'> man blablablablah.sh </a><BR> </BODY> </HTML>` [download]	[reply] [d/l] [select]
Re: Re: Search and replace in html by Limbic~Region (Chancellor) on May 09, 2003 at 00:22 UTC
LameNerd, Try your code with: `__DATA__ asdfasdf asdfasdf asdfasdf<htMl >asdfasdf blah </htmlasdf> foo bar </html >asdfasdf asdfasdf` [download] I am not saying that Anonymous Monk should even be attempting to do this as a roll your own solution (go CPAN) - just thought I would point out a weakness or two. Cheers - L~R	[reply] [d/l]
Re: Re: Re: Search and replace in html by LameNerd (Hermit) on May 09, 2003 at 03:48 UTC
The output is ... `asdfasdf asdfasdf asdfasdfasdfasdf blah foo bar asdfasdf asdfasdf` [download] What's wrong with that? It got rid of the html tags? I think that is all Anonymous Monk wanted to accomplish. That is also why I stated in my original post ... Do you want something like this?	[reply] [d/l]
Re: Re: Re: Re: Search and replace in html by Limbic~Region (Chancellor) on May 09, 2003 at 12:49 UTC
Re: Re: Re: Re: Re: Search and replace in html by LameNerd (Hermit) on May 09, 2003 at 22:50 UTC


Pathologically Eclectic Rubbish Lister
	PerlMonks