regexp pattern match help?

Elijah has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regexp pattern match help? by Roger (Parson) on Dec 03, 2003 at 00:47 UTC
It's because you had \b switches to match a word boundary, where # is not considered as a word, but rather a boundary instead. Update: The following code is flawed, see Abigail-II's comment below. If you want to match '#' in your regex, you could do this instead - `my $str = "This is a line # with comment"; my $word = '#'; while ($str =~ /[^\B#]($word)[^\B#]/g) { print "$1\n"; } __OUTPUT__ #` [download] Notice the `[^\B#]` idiom, what it means is that I want a character set of \B, non-word boundary, and #, and then take the compliment of the set. So the result will be a word boundary that does not match on the # character. Update: Thanks to Abigail-II for the detailed analysis of `[^\B#]`. Ok, below is one way I think would fix the problem - `my $str = "This is a line # with comment Boss."; my $word = "#"; # define custom \b my $b = qr/(?:(?=\S)(?<!\S)\|(?!\S)(?<=\S))/; # and match on non-space characters while ($str =~ /$b(\S+)$b/g) { print "$1\n"; } # or ignore the boundaries completely and match on non-space character +s while ($str =~ /(\S+)/g) { print "$1\n"; }` [download]	[reply] [d/l] [select]
Re: regexp pattern match help? by Abigail-II (Bishop) on Dec 03, 2003 at 01:20 UTC
`\B` and `\b` are zero-width assertions, and therefore, they don't make any sense inside a character class. Hence, `[^\B#]` doesn't do what you think: $ perl -Dr -ce '/[^\B#]/' Compiling REx `[^\B#]' size 12 Got 100 bytes for offset annotations. first at 1 1: ANYOF[\0-"$-AC-\377{unicode_all}](12) 12: END(0) stclass `ANYOF[\0-"$-AC-\377{unicode_all}]' minlen 1 Offsets: [12] 1[6] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 7[0] Omitting $` $& $' support. EXECUTING... -e syntax OK Freeing REx: `"[^\\B#]"' $ [download] Which means that `[^\B#]` matches any character that is not a B nor a #. Abigail	[reply] [d/l] [select]
Re: Re: regexp pattern match help? by Elijah (Hermit) on Dec 03, 2003 at 06:52 UTC
Ok cool, good to know. I think I have a decent way of accomplishing this but need a good way to set the length of string to be colored to the whole commented string. Here is what I have so far: `my $word = '#'; my $next = "1.0"; while (my $from = $t->search(-regexp, "\\B$word\\B", $next +, "end")) { my @comment = split(/#/, $_); #print "\$comment[0] equals ",$comment[0],"\n"; #print "\$comment[1] equals ",$comment[1],"\n"; if ($comment[1]) { my $word_len = length $comment[1] + length $word; }else{ my $word_len = length $comment[0] + length $word; } print $word; print $word_len; $next = "$from + $word_len chars"; $t->tagAdd("orange", $from, $next); $t->tagAdd("bold", $from, $next); }` [download] I decided to use split to accomplish what I wanted but for some reason on each string the length only ends up being 5 so the first 5 characters of the commented strings gets colored. How can I color the whole comment once the comment character is found. Oh and am I searching for the comment symbol the best way using the "B"?	[reply] [d/l]
Re: Re: Re: regexp pattern match help? by ysth (Canon) on Dec 03, 2003 at 07:35 UTC