in reply to Regex to ignore comment

So, if I understand correctly:

Assuming you don't have to pay attention to quoting (i.e. pattern declarations that contain quoted #'s), I think the easiest way to accomplish that would be to simply remove any possible comments before parsing a line, i.e.:

while(my $line = <DATA>) { chomp $line ; $line =~ s/\s*#.*$//; # ... }

Here's a few more tips while I'm at it, too.

Replies are listed 'Best First'.
Re^2: Regex to ignore comment
by crusty_collins (Friar) on Oct 20, 2015 at 19:32 UTC
    Thanks for the comments AppleFritter

    I don't want to do the regex $line =~ s/\s*#.*$//; becuse it might be in the pattern. such as

    pattern = the number is #8 # number

    where #8 is in the pattern

    and # number is a comment

    I was hoping that i could do a look behind and catch it that way.

    But corion's way of doing this is really the same thing.

    $line =~ s!#.*$!!; # strip off comments my( $key, $value ) = split /=/, $line;

      I see. But doesn't that make the format itself ambiguous? Put another way, how can you tell the following two apart, programmatically?

      pattern = the number is #8       # number
      
      pattern = 255.257.0.0            # invalid, and BTW, this comment contains a # character
      

      To a human (or pony) reading this, it's obvious that the comment starts on the second # in the first line, and on the first # in the second line. But how would a program tell the difference?

      This is what I meant by quoting, BTW. If your format required you to write e.g.

      pattern = "the number is #8"     # number
      

      to avoid this ambiguity, you'd have to deal with quoting, but at least you'd be able to rely on the first unquoted # character on a line to actually indicate a comment.

      Well if # can be both part of your pattern and an indication that it is a comment to be removed, then you need to specify how to distinguish between the two cases.

      With an input string such as:

      pattern = the number is #8 # number
      this would remove everything after (and including) the last # of your string:
      s/#[^#]+$//;
      but this assumes that you always have a trailing comment in your input.

      But we have not way to know whether it will work with your other input lines (i.e. if there is always a trailing comment in your lines).

      How do you know the difference between:

      pattern = the number is #8 # number

      and

      pattern = the number is 9 #8 , changed 2015-10-20

      Maybe a comment really starts with "# " (hash and then a blank)?