UPDATE: I have to agree with the rest of them. For safety reasons (so you don't demolish the test file), you may want to open $file but save to $file2 just incase the unexpected happens..
I agree, and will update my node to do so.
How exactly isn't this going to treat HTML correctly? ... It's not interpreting the file as HTML at all
That's all I meant; it won't look for HTML tags, it will look for literal text, including what it finds in comments, script, etc.
My question to you was, what exactly is line 5 doing with the joining, maping and sorting? You're playing with length which I thought only stored the length in characters of the item you're using it with.
Sorting greatest length first ensures that the match will work if you have e.g.
"<A " and
"<A HREF". Without the sort, you get results like:
$ perl
use warnings;
use strict;
my @codes = ("<a ", "<a href");
my $codes_regex = join "|", map quotemeta $_,
# sort { length $b <=> length $a }
@codes;
my $text = "testing a link: <A HREF=\"fooble.html\">boofle</a>";
print "in: $text\n";
$text =~ s/($codes_regex)/lc $1/gie;
print "out: $text\n";
__END__
output with the sort:
in: testing a link: <A HREF="fooble.html">boofle</a>
out: testing a link: <a href="fooble.html">boofle</a>
and without:
in: testing a link: <A HREF="fooble.html">boofle</a>
out: testing a link: <a HREF="fooble.html">boofle</a>
This is because the perl regexes prefer the leftmost |'d alternative, even if it makes a shorter match.
The map is just to apply the quotemeta; the join is to put
| between tags.