Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Support for hash comments on a line

by c (Hermit)
on Nov 01, 2001 at 19:54 UTC ( [id://122573]=perlquestion: print w/replies, xml ) Need Help??

c has asked for the wisdom of the Perl Monks concerning the following question:

i am working on supported '#' style comments within a text file being read by my script. now, getting the script to skip over lines beginning with a '#' is easy enough

next if ($line /^\#/);

however, its when the comment is midstream that i start scratching my head.

hostname # this is a comment for hostname

so i tried a regex substitution

$line =~ s/(.*)\#.*/$1/g;

but that certainly did not work. i also tried something slightly less greedy with s/([\w\d]+)\#.*/$1/g but that was just as fruitless. i wasnt able to find any examples of this in the monastery when searching for 'removing comments' or 'midline comments'. if someone has a more productive search string, or some simple code to do the trick, i'd appreciate the post.

humbly -c

Replies are listed 'Best First'.
Re: Support for hash comments on a line
by Fastolfe (Vicar) on Nov 01, 2001 at 19:56 UTC
    This will break if you use a # in your "real" data, but I generally would use something like this:
    while (<>) { chomp; s/\s*#.*//; # Strip off whitespace and trailing comments next if /^\s*$/; # Skip blank lines &process_line_of_input($_); }
      When i run your code on my file:

      hostname3 # this is a comment for hostname3
      I get:

      Use of uninitialized value in substitution (s///) at ./comment line 11 +, <FH> line 1.

      my full code is:

      #!/usr/bin/perl -w use strict; open(FH, "comments.txt") or die "cant open file"; while (my $line = <FH>) { chomp $line; next if $line =~ /^$/; $line = s/\s*\#.*//; print "|$line|\n"; } close(FH);

      humbly -c

        You want to use =~ instead of = when doing regexp substitions on a variable. You're basically doing this:
        $line = ($_ =~ s/\s*\#.*//);
        Since $_ is undefined here, you get that warning.
Re: Support for hash comments on a line
by jlongino (Parson) on Nov 01, 2001 at 20:14 UTC
    Another way to do it is to take advantage of prematch ($`):
    use strict; while (<DATA>) { chomp; $_ = $` if /#/; print "$_\n" if $_; } __DATA__ ### Hello 99:88:77 100:11# This is a comment abc:def ### Comments also 999

    --Jim

    Update: shortenned conditional after chomp;

      Just a thought, using the prematch and postmatch at all will slow down all regular expressions as the engine will have to save them for every regex in your program.

      -Lee

      "To be civilized is to deny one's nature."
        After reading chapter 5 of japhy's book, you'll be able to rewrite prematch as:
        substr($string, 0, $-[0])
        For example:
        #!/usr/bin/perl -wT use strict; while (<DATA>) { chomp; $_ = substr($_, 0, $-[0]) if /#/; print "$_\n" if $_; } __DATA__ ### Hello 99:88:77 100:11# This is a comment abc:def ### Comments also 999

        -Blake

Re: Support for hash comments on a line
by buckaduck (Chaplain) on Nov 01, 2001 at 23:14 UTC
    How about a non-regex solution?
    ($line) = split /#/, $line;

    buckaduck

Re: Support for hash comments on a line
by sevensven (Pilgrim) on Nov 01, 2001 at 23:18 UTC

    Your second code was close, but you've forgoten that a regular expression like .* is a greedy expression, it will match everything it can, and indeed a .* can match everything :-)

    In your seconde example ( $line =~ s/(.*)\#.*/$1/g;) you should change (.*) to (.*?) and it will work as you wanted.

    Adding the ? makes the previous .* match the smallest possible pattern and leave the rest of the input to the rest of the regexp.

    This is explained in greater detail in perlre Perl Regular Expressions.

    HTH, going back to building perl with thread support.

Re: Support for hash comments on a line
by Fletch (Bishop) on Nov 01, 2001 at 20:48 UTC

    I usually just do something like:

    while( <FOO> ) { s/\s*#.*$//; next if /^\s*$/; ... }

    Anything fancier than that you might want to look into Parse::RecDescent and build a smarter parser, or maybe use AppConfig. Or maybe go to an XML based format and let XML::Parser worry about all of the parsing and comments and what not.

Re: Support for hash comments on a line
by mr_mischief (Monsignor) on Nov 02, 2001 at 03:25 UTC
    This will get rid of pretty much all trailing comments, except those which contain quote characters. This strikes a balance with not trying to strip hash marks that are in quotes as data.
    while( <> ) { s/\s+#[^'"]+\z//; }
    So, if you can guarantee that no hash marks are data except in quotes and that there are no quotes in your comments, this should be a simple way to do it that makes reasonable accommodations for using the hash mark in data. If you have any other ways to wrap data in quote-like characters, just add them to the negated character class and keep them out of the comments.

    Update: Fixed a typo. 2002/05/02
Re: Support for hash comments on a line
by FoxtrotUniform (Prior) on Nov 01, 2001 at 22:40 UTC

    Note: untested code follows

    If you know that #s won't appear in your data, you can write:

    $line =~ s/^([^#]*)#.*$/$1/;

    If #s can appear in quoted strings, life gets a little more complex:

    $line =~ s/^ ( # grab this stuff in $1 ( [^#"]* # prefix of non-#s, non-"s (\" # start of string [^\"]* # content of string \")? # end of string [^#"]* # suffix )* # grab many prefix-string-suffixes ) \# # start of comment .* $ /$1/x;

    (At this point, you may be better off using one of the Text modules, and if the input's really hairy, Parse::RecDescent.)

    Update: Er, that second regex is s/.../$1/x;, not s/.../x;. Doh!

    --
    :wq

Re: Support for hash comments on a line (Why use a regex at all?)
by demerphq (Chancellor) on Nov 02, 2001 at 18:20 UTC
    Not real sure why everyone posted regex solutions here, use substr and index. Much faster.
    while (<DATA>) { if ( ( my $p = index( $_, "#" ) ) > -1 ) { substr( $_, $p, -1, "" +) } next if /^\s*$/; print; } __DATA__ #comment this is test#comment #comment this is test #comment this is test #comment this is test #comment
    A regex in any form (split, s/// or m//) is overkill for this task. (Assuming of course that # can't appear in the real data)

    Yves / DeMerphq
    --
    Have you registered your Name Space?

      Unless you're dealing with a large number of lines here, the performance penalty of going with a regex is, in my opinion, inconsequential compared with the added readability of code that uses it. No one skimming the code above is going to have the slightest idea what it does without studying it.

      Though don't get me wrong, if your requirements are such that you're going to be doing this sort of processing on a lot of data, and performance is a factor, this is one of many optimizations that can be made to squeeze speed out of the algorithm.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://122573]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-03-28 13:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found