Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm a bit baffled by a disappearing string. Maybe someone can offer some insight. I have a a multi-line string that MAY have embedded full line comments from the start of the string. I want to remove the comments. Here is a sample:

my $string = "--This is a comment\n--This Comment continues here\nNOT a comment";

My desired output is: $string = "NOT a comment"

But when I run my code, I get $string = "--This Comment continues here"

#!/usr/bin/perl -w #use strict; my $count = 1; my $string = "--This is a comment\n--This Comment continues here\nNOT +a comment"; while( $string =~ m/^--.*\n(.*)/ ) { #Debug Output print "ORIGINAL $count: $string\n****\n"; $string = $1; #Debug Output print "\n+++\nMODIFIED $count: $string\n"; $count++; } print "DONE $count: $string\n"; exit; __OUTPUT__ +++ MODIFIED 1: --This Comment continues here DONE 2: --This Comment continues here C:\temp>perl strip.pl ORIGINAL 1: --This is a comment --This Comment continues here NOT a comment **** +++ MODIFIED 1: --This Comment continues here DONE 2: --This Comment continues here C:\temp>perl strip.pl ORIGINAL 1: --This is a comment --This Comment continues here NOT a comment **** +++ MODIFIED 1: --This Comment continues here DONE 2: --This Comment continues here

Replies are listed 'Best First'.
Re: Removing nested comments with regexes
by Thelonius (Priest) on May 18, 2003 at 21:03 UTC
    The problem is that the dot "." does not match "\n". So after the first match, $1 = "--This Comment continues here" without the "\n" and following line. If you use the /s modifier, dot will match "\n", but then you have to be careful of matching too much, that is, the "^--.*\n" would then match many lines. There are several solutions to this, such as /^--.*?\n(.*)/s or /^--[^\n]*\n(.*)/s, but it might be simpler to do just this:
    while ($string =~ s/^--.*\n$//m) { print "string = $string\nDeleted $1\ncount=$count\n"; $count++; }

      However, your code will stop working if you consider

      my $string = "--This is a comment\nNot a comment\n" . "--This Comment continues here\nNOT a comment";

      Comments do not necessarily occur at the beginning of the (rest of the) string. (At least that's how I understood the question.) I propose to simply do

      my $string = "..."; $string =~ s/(^|\n)--[^\n]+//g; print "No comments: $string\n";

      If you're perfectionistic, you will notice that this will leave a leading newline if the string starts with a comment. You can fix that by using

      my $string = "..."; $string =~ s/\n--[^\n]+//g; $string =~ s/^--[^\n]+\n//; print "No comments: $string\n";
      Thanks for the substitution idea!

      I removed the "$" and the from the regex since it wouldn't match the data correctly. Also in your example there is no longer a grouping on $1. Here is the fixed code:

      my $count = 1; my $string = "--This is a comment\n--This Comment continues here\nNOT +a comment;\n"; print "Original: $string\n"; while ($string =~ s/^--.*\n//m) { print "string = $string\ncount=$count\n"; $count++; } print "Final: $string\n"; exit;
Re: Removing nested comments with regexes
by graff (Chancellor) on May 19, 2003 at 01:54 UTC
    Since the problem involves whole lines, you could split the multi-line string at line breaks, and use grep to ignore the lines you don't want:
    @wanted_lines = grep !/^--/, split( /\n/, $string );