Elijah has asked for the wisdom of the Perl Monks concerning the following question:

Can anyone tell me why this pattern is never matched when parsing a file with commented strings in it?
my $word = '#'; my $next = "1.0"; while (my $from = $t->search(-regexp, "\\b$word\\b", $next +, "end")) { my $word_len = length $' + 1; print $word; print $word_len; $next = "$from + $word_len chars"; $t->tagAdd("red", $from, $next); $t->tagAdd("bold", $from, $next); }
It works fine for ever other word I have being searhed for but not the # character.

Replies are listed 'Best First'.
Re: regexp pattern match help?
by Roger (Parson) on Dec 03, 2003 at 00:47 UTC
    It's because you had \b switches to match a word boundary, where # is not considered as a word, but rather a boundary instead.

    Update: The following code is flawed, see Abigail-II's comment below.

    If you want to match '#' in your regex, you could do this instead -
    my $str = "This is a line # with comment"; my $word = '#'; while ($str =~ /[^\B#]($word)[^\B#]/g) { print "$1\n"; } __OUTPUT__ #
    Notice the [^\B#] idiom, what it means is that I want a character set of \B, non-word boundary, and #, and then take the compliment of the set. So the result will be a word boundary that does not match on the # character.

    Update: Thanks to Abigail-II for the detailed analysis of [^\B#].

    Ok, below is one way I think would fix the problem -
    my $str = "This is a line # with comment Boss."; my $word = "#"; # define custom \b my $b = qr/(?:(?=\S)(?<!\S)|(?!\S)(?<=\S))/; # and match on non-space characters while ($str =~ /$b(\S+)$b/g) { print "$1\n"; } # or ignore the boundaries completely and match on non-space character +s while ($str =~ /(\S+)/g) { print "$1\n"; }
      \B and \b are zero-width assertions, and therefore, they don't make any sense inside a character class. Hence, [^\B#] doesn't do what you think:
      $ perl -Dr -ce '/[^\B#]/' Compiling REx `[^\B#]' size 12 Got 100 bytes for offset annotations. first at 1 1: ANYOF[\0-"$-AC-\377{unicode_all}](12) 12: END(0) stclass `ANYOF[\0-"$-AC-\377{unicode_all}]' minlen 1 Offsets: [12] 1[6] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 7[0] Omitting $` $& $' support. EXECUTING... -e syntax OK Freeing REx: `"[^\\B#]"' $
      Which means that [^\B#] matches any character that is not a B nor a #.

      Abigail

        Ok cool, good to know. I think I have a decent way of accomplishing this but need a good way to set the length of string to be colored to the whole commented string. Here is what I have so far:
        my $word = '#'; my $next = "1.0"; while (my $from = $t->search(-regexp, "\\B$word\\B", $next +, "end")) { my @comment = split(/#/, $_); #print "\$comment[0] equals ",$comment[0],"\n"; #print "\$comment[1] equals ",$comment[1],"\n"; if ($comment[1]) { my $word_len = length $comment[1] + length $word; }else{ my $word_len = length $comment[0] + length $word; } print $word; print $word_len; $next = "$from + $word_len chars"; $t->tagAdd("orange", $from, $next); $t->tagAdd("bold", $from, $next); }
        I decided to use split to accomplish what I wanted but for some reason on each string the length only ends up being 5 so the first 5 characters of the commented strings gets colored. How can I color the whole comment once the comment character is found. Oh and am I searching for the comment symbol the best way using the "B"?