why_bird has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I'm parsing a configuration file, so before doing the proecssing, I want to remove comments, which I take for granted are in the same style as Perl (i.e. one line comments starting with '#' which can occur after other text, which must be left intact)

Here is my code. It works fine unless it comes across '[]' in a comment in my config file (it works fine if the '[]' is outside a comment), when it throws this:
Unmatched [ in regex; marked by <-- HERE in m/ ...some text... [<-- HERE] ...some more text..

while (chomp(my $temp=<INPUT>)){ print Dumper $temp; if ($temp =~ /#/){ $temp =~ s/#$'//; } if ($temp =~ /^\s*$/){ next; } print "after regex:\n"; print Dumper $temp; print "end\n"; }

I'm a bit new with regexes so chances are I've missed something obvious---any ideas?

Those are my principles. If you don't like them I have others.
-- Groucho Marx

Replies are listed 'Best First'.
Re: Regex error when [] occurs in file..
by moritz (Cardinal) on Mar 03, 2008 at 15:38 UTC
    if ($temp =~ /#/){ $temp =~ s/#$'//; }

    That's what's causing the problem: $' can contain arbitrary data, but you try to treat it as a regex.

    The "good" solution is to use this regex instead: $temp =~ s/#.*$//;

    In general you can also quote interpolated variables, then they are treated as text, not as regexes:

    my $varaible = '[a-z]'; m/\Q$variable\E/ # matches literal [a-z], not a character class.

    If you're not inside a regex, quotemeta does the same job.

      If you would happen to have the '#' character inside a (single or double) quoted string in your config file (I don't know if the specs for your config-file even allow this) then the s/#.*$// regex will cause you trouble as it will delete all of the string starting with the '#' character. That is probably not what you want.

      It is not easy to take care of this: not even Regexp::Common gets it right.


      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Regex error when [] occurs in file..
by Joost (Canon) on Mar 03, 2008 at 15:42 UTC
    $temp =~ s/#$'//; I'm not even sure what you want that to do, but it constructs a regex out of # followed by whatever was followed by the previous match. Any regex special characters in that generated string will be interpreted as regex directives.

    You probably want something like:

    while (chomp(my $temp=<INPUT>)){ print Dumper $temp; $temp =~ s/#.*//; # remove comments if ($temp =~ /^\s*$/){ next; } print "after regex:\n"; print Dumper $temp; print "end\n"; }
      Thanks both---I didn't realise that $' was treated like that. Are all 'special variables' ($&, $N (N is integer)) expanded in that way too? So if you did something like:
      $temp =~ m/(\[0-9\])blah$1/;
      would you match
      $temp = "[0-9]blah6";
      rather than
      $temp = "[0-9]blah[0-9]";
      V. interesting---I assumed that special characters inside the rest of the data would be ignored..
      <--edit to make the last sentence make more sense!-->
      Those are my principles. If you don't like them I have others.
      -- Groucho Marx
        $temp =~ m/(\[0-9\])blah$1/;

        I think you meant

        $temp =~ m/(\[0-9\])blah\1/;

        in which case any special characters in the content of the backreference \1 would not be treated special. IOW, "[0-9]blah[0-9]" would match, but not "[0-9]blah6":

        #!/usr/bin/perl use strict; use warnings; for my $temp ("[0-9]blah[0-9]", "[0-9]blah6") { printf "%-15s ", $temp; if ($temp =~ /(\[0-9\])blah\1/) { print "matched\n"; } else { print "didn't match\n"; } }


        [0-9]blah[0-9] matched [0-9]blah6 didn't match

        while, if you replace \1 with $1 in the above regex, it prints

        Use of uninitialized value in concatenation (.) or string at ./671663. +pl line 8. [0-9]blah[0-9] matched [0-9]blah6 matched

        This is because $1 isn't defined here, thus the regex effectively becomes /(\[0-9\])blah/...

        Update: added demo code.

Re: Regex error when [] occurs in file..
by ysth (Canon) on Mar 03, 2008 at 21:27 UTC
    while (chomp(my $temp=<INPUT>)){
    Don't do that. Do your chomp inside the loop. Otherwise, you'll get a warning (you do have warnings enabled, don't you?) when you reach the end and <INPUT> returns undef, since chomp expects a string, not undef.