Re: Defining Characters in Word Boundary?

/\b/ is equivalent to /(?<=\w)(?!\w)|(?<!\w)(?=\w)/. Feel free to replace \w with a character class.

It would be tedious to have to write \\$keyword([^a-zA-Z]) and then have to substitute back $1 (because I do not want it eaten).

Don't eat it if you don't want add it back. Equivalent without eating:

\\$keyword(?=[^a-zA-Z])
[download]

But you surely meant

\\$keyword(?![a-zA-Z])
[download]

In general, it's easier to extract the keyword, then check if it's the one you want.

\\([a-zA-Z]+)
[download]

Comment on Re: Defining Characters in Word Boundary? Select or Download Code

Replies are listed 'Best First'.
Re^2: Defining Characters in Word Boundary? by Jim (Curate) on Jan 20, 2011 at 01:30 UTC
In general, it's easier to extract the keyword, then check if it's the one you want. I agree wholeheartedly. Since the LaTeX name constraint is exact and well-understood (the characters 'a' through 'z' and the characters 'A' through 'Z'), you simply need to match just those characters. Explicitly matching the right-hand boundary isn't necessary.	[reply]
Re^2: Defining Characters in Word Boundary? by iaw4 (Monk) on Jan 20, 2011 at 14:00 UTC
thanks. this is what I needed to learn. I did not know the extended regex expressions in the camel book (i.e., (?...) sequences), chapter 5, table 5.6. is there a meaningful difference between (?!a-z) and (?=^a-z)? is the former recommended? /iaw	[reply]
Re^3: Defining Characters in Word Boundary? by ikegami (Patriarch) on Jan 20, 2011 at 16:32 UTC
Compare `'ab' =~ /a(?!a)/ 'a' =~ /a(?!a)/` [download] and `'ab' =~ /a(?=[^a])/ 'a' =~ /a(?=[^a])/` [download]	[reply] [d/l] [select]
Re^3: Defining Characters in Word Boundary? by Jim (Curate) on Jan 20, 2011 at 17:17 UTC
is there a meaningful difference between `(?![a-z])` and `(?=[^a-z])`? is the former recommended? Yes, they're different regular expression patterns that match different things. `(?![a-z])` asserts "not followed by any of the characters from 'a' through 'z', which includes not being followed by any character." `(?=[^a-z])` asserts "followed by a single character that is not any of the characters from 'a' through 'z'." The former is a negative assertion; the latter is a positive assertion. In your case, `(?![a-z])` is what you would want to use. [PerlMonks posting tip: Enclose Perl code in `<code></code>` tags, even code within paragraphs.] UPDATE: Removed color.	[reply] [d/l] [select]
Re^4: Defining Characters in Word Boundary? by AnomalousMonk (Archbishop) on Jan 20, 2011 at 23:28 UTC
In your case, `(?![a-z])` is what you would want to use. One behavioral difference between these regexes and, in the case of the OP, the reason iaw4 would (probably) want to use this regex is that it can match at the end of a string and thus emulates the behavior of the `\b` assertion. (Note: `\b` can also match at the start of a string.) `>perl -wMstrict -le "my $str = 'abcd'; for my $rx (qr{(?=[^a-z])}, qr{(?![a-z])}, qr{\b}) { my @offsets; push @offsets, $-[1] while $str =~ m{ ($rx) }xmsg; if (@offsets) { print qq{$rx matches '$str' at offset(s) @offsets}; } else { print qq{$rx does not match '$str'}; } } " (?-xism:(?=[^a-z])) does not match 'abcd' (?-xism:(?![a-z])) matches 'abcd' at offset(s) 4 (?-xism:\b) matches 'abcd' at offset(s) 0 4` [download]	[reply] [d/l] [select]
Re^5: Defining Characters in Word Boundary? by Jim (Curate) on Jan 21, 2011 at 01:06 UTC
Re^6: Defining Characters in Word Boundary? by ikegami (Patriarch) on Jan 21, 2011 at 01:24 UTC
Re^4: Defining Characters in Word Boundary? by Anonymous Monk on Jan 20, 2011 at 18:08 UTC
Please avoid casually using colors here on perlmonks, they don't play well with themes. Customizing PerlMonks CSS, Help for Display Settings	[reply]
Re^5: Defining Characters in Word Boundary? by Jim (Curate) on Jan 20, 2011 at 19:40 UTC
Re^6: Defining Characters in Word Boundary? by Anonymous Monk on Jan 21, 2011 at 05:40 UTC
Some notes below your chosen depth have not been shown here