Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Stripping Comments

by hangon (Deacon)
on Jun 09, 2007 at 08:43 UTC ( [id://620166]=perlquestion: print w/replies, xml ) Need Help??

hangon has asked for the wisdom of the Perl Monks concerning the following question:

I need to strip comments from a configuration file. The comments start with # and go to the end of the line, unless the initial # is escaped. This is what I have:

while (my $line = <FH>){ $line =~ s/[^\\]#.*//; $line =~ s/^#.*//; $line =~ s/\\#/#/g; print "$line"; }

The second regex is to handle cases where the line begins with #. The third just removes the escape character. This works, but is there a better way or can it be simplified with with fewer regexes? Thanks.

sample data:

\#Not a comment #Comment \#Not a comment #But this is a comment arbitrary text #Comment arbitrary text \#Not a comment arbitrary text \#Not a comment #But this is a comment arbitrary text #Comment \#This is part of the comment

desired output:

#Not a comment #Not a comment arbitrary text arbitrary text #Not a comment arbitrary text #Not a comment arbitrary text

UPDATE: Got what I needed, and a little education too. Thanks for the help everyone.

Replies are listed 'Best First'.
Re: Stripping Comments
by varian (Chaplain) on Jun 09, 2007 at 09:18 UTC
    Try this:
    #!/usr/bin/perl use strict;use warnings; while (my $line=<DATA>) { $line=~s/(?<!\\)#.*//; print $line; } __DATA__ \#Not a comment #Comment \#Not a comment #But this is a comment arbitrary text #Comment arbitrary text \#Not a comment arbitrary text \#Not a comment #But this is a comment arbitrary text #Comment \#This is part of the comment
    Prints:
    \#Not a comment \#Not a comment arbitrary text arbitrary text \#Not a comment arbitrary text \#Not a comment arbitrary text
    The '(?<!' construct looks backward to ensure that the escape character is not preceeding the comment character.
Re: Stripping Comments (one)
by tye (Sage) on Jun 10, 2007 at 02:28 UTC
    no warnings 'uninitialized'; s<\\(#)|\s*#[^\n]*$><$1>g

    Note that I meet your examples as you entered them and remove the spaces before the "#" comment, unlike your code or most of the other code in this thread. I use [^\n] instead of just . since it is more explicit, especially since in this case it is important to stop at a newline.

    - tye        

Re: Stripping Comments
by moritz (Cardinal) on Jun 09, 2007 at 09:10 UTC
      The problem with the first RE is that it removes the character before the hash mark. In your examples, it's always whitespace, so you didn't recognize it. You could replace the first regex with
      s/([^\\])#.*/$1/;
      Better yet, replace the first and second RE with the RE that varian provides. You'll need the third RE nonetheless.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://620166]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-26 02:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found