in reply to Search and replace everything except html tags
You don't need the '$_ =~' here. Substitution operates on $_ by default. Also, this is an occassion where tr/// would be a better choice of operators. if ( "$_" eq "<(.*)>" ) { You need a regex here instead of stringwise equality, but even then it won't do what you expect. Also don't get into the bad habit of quoting scalars. If you want $_, just say $_, not "$_".foreach (@open) { $_ =~ s/\n//ig;
Here's one way to do this with regular expressions. Regexes, however, invariably fail on "real world" HTML. The prefered method is to use HTML::Parser or its derivatives.
#!/usr/bin/perl -w use strict; open HTML, "temp.html" or die "Can't open file: $!\n"; { local $/; $_ = <HTML>; } close HTML; tr/\n / /s; while ( /([^<>]*)(<[^>]*>)?/g ) { print "TEXT: $text\n" if defined $1; print "HTML: $html\n" if $2; }
|
|---|