Tricky has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks,

I'm having a conundrum with a regexp for an HTML background-color attribute. The IDE I'm using highlight's the '#' and subsequent characters as comments. I've tried escaping the hash symbol, to no avail.

Cheers in advance, folks.
Here's the HTML tag I want to change: <body style="background-color: #FAF519">
...and the script...
#!/usr/bin/perl # change background color.plx # Program will read in an html file, change background color and print + to source file. # 1. No need for file variable yet: open (INFILE, "<".$htmlFile) or di +e("Can't read source file!\n"); # /background-color:\s*#([0-9a-f]{6});?/ig # /background-color:\s*#([0-9a-f]{6}|[0-9a-f]{3});?/ig use warnings; use diagnostics; use strict; # Declare and initialise variables. my @htmlLines; # Open HTML test file - forward slashes do not need to be escaped. open INFILE, "E:/Documents and Settings/Richard Lamb/My Documents/HTML +/dummy1.html" or die "Sod! Can't open this file.\n"; # Assign to an array/list variable. @htmlLines = <INFILE>; close (INFILE); sub changeBackColour { foreach my $line (@htmlLines) { # case insensitivity and global search for pattern $line =~ s/background-color:\s*\#([0-9a-f]{6}|[0-9a-f]{3});?/backg +round-color:\s*\#FFFFFF;?/ig; } } changeBackColour(); sub printHTML { for my $i (0..@htmlLines-1) { print $htmlLines[$i]; } } printHTML(); # prints the reformatted HTML file in DOS window

Replies are listed 'Best First'.
Re: Escaping '#' in a regexp
by CombatSquirrel (Hermit) on Aug 30, 2003 at 10:38 UTC
    Except for RegExes with an /x modifier, you don't need to escape octohorphs (#). Your substitution has one problem, though: Remember that you can only substitute a pattern by a string, so that the asterik, question mark, etc. are going to be interpreted as what they are. Try this instead:
    $line =~ s/background-color:\s*#(?:[0-9a-f]{6}|[0-9a-f]{3});?/backgrou +nd-color: #FFFFFF;/ig;
    Hope this helped.
    CombatSquirrel.
    Entropy is the tendency of everything going to hell.
Re: Escaping '#' in a regexp
by jeffa (Bishop) on Aug 30, 2003 at 18:23 UTC
    Hi Tricky. How about we start over. :)

    Your program starts out decent enough, although the comments are hard to read. Let's replace them with POD:

    #!/usr/bin/perl use strict; use warnings; =head1 DESCRIPTION tricky_html_filter.pl - filters the stuff i want - changes background-color to #fff - <insert other filter description here> =head2 USAGE perl tricky_html_filter.pl foo.html =cut
    Now, let's get the file we are going to open from the user instead of just hard coding it. Since we are still testing we will use a default file so that we don't have to specify it every time we run the script:
    my $file = shift || 'E:/path/to/dummy1.html'; open INFILE, '<', $file, or die "$!: can't open $file\n";
    This is much better for a number of reasons:
    1. we can specify the file on the command line or use a default
    2. we are using the 3 argument form of open (for now, think of this as just a good habit to get into)
    3. we are reporting why we couldn't read the file (via the variable $!) if there is an error
    Now that we have the file opened and ready to read, we can do so - but hold it right there. In your code, you slurp the entire file into an array and then loop through the array. I see that you put that for loop into a subroutine, that's a worthy try, but you are also using global variables, which is not good. Not only am i not going to use any of my own subroutines, i am not going to store the file in an array and loop across it. I am going to simply use a while loop:
    while (<INFILE>) { s/(background-color:#)(?:[0-9a-f]{6}|[0-9a-f]{3})/$1ffffff/i; print; }
    And that's it! No. Really. That's it. But i suppose i should continue with the explanation. ;)
    while (<INFILE>) { print; }
    This is the same thing as saying:
    while ($_ = <INFILE>} { print $_; }
    which is more consise way of saying:
    while (my $line = <INFILE>} { print $line; }
    These snippets all grab, one line at a time, lines from INFILE - store them, one line at a time, into a variable (either Perl's built-in $_ or our $line) - and print that variable, one line at a time to standard out. All we need to do is modify that variable if we want to make a filter.

    Which finally brings us to the regex. There are many ways to match what you want, mine is just one, and it looks very familiar:

    s/(background-color:#)(?:[0-9a-f]{6}|[0-9a-f]{3})/$1ffffff/i;
    The first thing i do is try to match the literal string background-color:#. Don't be fooled, you do not have to escape the # or : chacacters, but do notice that i put that string in parenthesis: (background-color:#). This causes the string matched to be copied to the built-in variable $1.

    Next is 6 or 3'hexadecimal looking' charcaters: (?:[0-9a-f]{6}|[0-9a-f]{3}) It look mostly the same as yours, but what's with the ?: thingy? This allows you use parens without capturing the match. We need the parens for the 'or' token, but we don't need to catch the match into $2 because we are going to discard this color. This is the end of the match.

    The substitute is simple:

    $1ffffff
    This takes what we captured in $1 and appends the literal string ffffff to it. But note that while i did specify the i modifier, i did not specify the g modifer because there is only one <body> tag in an HTML page, and that <body> tag can only have background-color CSS attribute. one

    Here is the complete script again, for ... completeness:

    #!/usr/bin/perl use strict; use warnings; =head1 DESCRIPTION tricky_html_filter.pl - filters the stuff i want - changes background-color to #fff - <insert other filter description here> =cut my $10ffffff; my $file = shift || 'foo.html'; open INFILE, '<', $file, or die "$!: can't open $file\n"; while (<INFILE>) { s/(background-color:#)(?:[0-9a-f]{6}|[0-9a-f]{3})/$1ffffff/i; print; }
    It works as specified, but is it useful? Didn't you want to change the file you are reading from? If i recall correctly, yes you did. And you are probably saying right about now "that's why i stored the file contents into an array - because i have to close the file and re-opend it for writing." Well ... no you don't. :)
    perl -pi -e"s/(background-color:#)(?:[0-9a-f]{6}|[0-9a-f]{3})/$1ffffff +/i" /path/to/dummy1.html # be sure and replace " with ' if you run this on *NIX
    Note that this is pretty much what tachyon said (and also what davorg said, by the way). There is a lot going on behing the scenes in that small amount of code. It does what you have tried in about 40 lines with only one. You can read more about the -i and -p switches at perlrun.

    Hope this helps, :)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Escaping '#' in a regexp
by tachyon (Chancellor) on Aug 30, 2003 at 12:37 UTC

    Lends itself to a one liner inplace edit (writes orig to orig.bak)

    perl -pi.bak -e 's/background-color:\s*#(?:[0-9a-f]{6}|[0-9a-f]{3})\s* +;?/background-color: #FFFFFF;/i' <INFILE>

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Escaping '#' in a regexp
by hossman (Prior) on Aug 30, 2003 at 20:09 UTC
    I'm having a conundrum with a regexp for an HTML background-color attribute. The IDE I'm using highlight's the '#' and subsequent characters as comments. I've tried escaping the hash symbol, to no avail.

    While I'm sure you appreciate all of the helpful coding/commenting tips provided by our fellow monks, i noticed most of them don't really seem to have paid very much attention to your specific question.

    In general, you don't need to escape a "#" character you use in a regexp, unless you really want to -- I tend to use "\#" so that my emacs syntax highlighting makes more sense.

    If however your IDEs syntax highlighting isn't that smart, you can allways try writting the # symbol as hex: "\x23", so that your IDE never even notices that it's a comment...

    box:~> perl -le 'print "yes" if shift =~ "a\x23b";' aaa#bcd yes