Hi Tricky. How about we start over. :)
Your program starts out decent enough, although the comments
are hard to read. Let's replace them with POD:
#!/usr/bin/perl
use strict;
use warnings;
=head1 DESCRIPTION
tricky_html_filter.pl - filters the stuff i want
- changes background-color to #fff
- <insert other filter description here>
=head2 USAGE
perl tricky_html_filter.pl foo.html
=cut
Now, let's get the file we are going to open from the
user instead of just hard coding it. Since we are still
testing we will use a default file so that we don't have
to specify it every time we run the script:
my $file = shift || 'E:/path/to/dummy1.html';
open INFILE, '<', $file, or die "$!: can't open $file\n";
This is much better for a number of reasons:
- we can specify the file on the command line or use a
default
- we are using the 3 argument form of open (for now,
think of this as just a good habit to get into)
- we are reporting why we couldn't read the file (via
the variable $!) if there is an error
Now that we have the file opened and ready to read, we can
do so - but hold it right there. In your code, you slurp
the entire file into an array and then loop through the
array. I see that you put that for loop into a subroutine,
that's a worthy try, but you are also using global
variables, which is not good. Not only am i not going to
use any of my own subroutines, i am not going to store the
file in an array and loop across it. I am going to simply
use a while loop:
while (<INFILE>) {
s/(background-color:#)(?:[0-9a-f]{6}|[0-9a-f]{3})/$1ffffff/i;
print;
}
And that's it! No. Really. That's it. But i suppose i
should continue with the explanation. ;)
while (<INFILE>) {
print;
}
This is the same thing as saying:
while ($_ = <INFILE>} {
print $_;
}
which is more consise way of saying:
while (my $line = <INFILE>} {
print $line;
}
These snippets all grab, one line at a time, lines from
INFILE - store them, one line at a time, into a variable
(either Perl's built-in $_ or our $line)
- and print that variable, one line at a time to standard
out. All we need to do is modify that variable if we want
to make a filter.
Which finally brings us to the regex. There are many ways
to match what you want, mine is just one, and it looks very
familiar:
s/(background-color:#)(?:[0-9a-f]{6}|[0-9a-f]{3})/$1ffffff/i;
The first thing i do is try to match the literal string
background-color:#. Don't be fooled, you do not
have to escape the # or : chacacters, but
do notice that i put that string in parenthesis:
(background-color:#). This causes the string
matched to be copied to the built-in variable $1.
Next is 6 or 3'hexadecimal looking' charcaters:
(?:[0-9a-f]{6}|[0-9a-f]{3})
It look mostly the same as yours, but what's with the
?: thingy? This allows you use parens without
capturing the match. We need the parens for the 'or'
token, but we don't need to catch the match into $2
because we are going to discard this color. This is the
end of the match.
The substitute is simple:
$1ffffff
This takes what we captured in $1 and appends
the literal string ffffff to it. But note that
while i did specify the i modifier, i did not
specify the g modifer because there is only one
<body> tag in an HTML page, and that
<body> tag can only have background-color CSS attribute.
one
Here is the complete script again, for ... completeness:
#!/usr/bin/perl
use strict;
use warnings;
=head1 DESCRIPTION
tricky_html_filter.pl - filters the stuff i want
- changes background-color to #fff
- <insert other filter description here>
=cut
my $10ffffff;
my $file = shift || 'foo.html';
open INFILE, '<', $file, or die "$!: can't open $file\n";
while (<INFILE>) {
s/(background-color:#)(?:[0-9a-f]{6}|[0-9a-f]{3})/$1ffffff/i;
print;
}
It works as specified, but is it useful? Didn't you want
to change the file you are reading from? If i recall
correctly, yes you did. And you are probably saying right
about now "that's why i stored the file contents into an
array - because i have to close the file and re-opend it
for writing." Well ... no you don't. :)
perl -pi -e"s/(background-color:#)(?:[0-9a-f]{6}|[0-9a-f]{3})/$1ffffff
+/i" /path/to/dummy1.html
# be sure and replace " with ' if you run this on *NIX
Note that this is pretty much what tachyon
said (and also what davorg
said, by the way). There is a lot going on
behing the scenes in that small amount of code. It does what
you have tried in about 40 lines with only one. You can read more about the -i and -p switches at
perlrun.
Hope this helps, :)
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
|