\w matches alphanumerics and underscore. \b is effectively the same as using lookbehinds and lookaheads like this:
(?:(?<=\w)(?=\W|\z)|(?:(?<=\W)|(?<=\A))(?=\w)
Update: Hmm, or even nicer, as merlyn posted in •Re: Why do zero width assertions care about lookahead/behind? (code examples also updated),
(?:(?<!\w)(?=\w)|(?<=\w)(?!\w))
So to make a specialized version of \b that views "-" and "/" as "word characters" (sort of), you might use something like this:
(?:(?<![\w/-])(?=[\w/-])|(?<=[\w/-])(?![\w/-]))
So maybe something like this will suit you?
my $w = '\w/-'; my $b = "(?:(?<![$w])(?=[$w])|(?<=[$w])(?![$w]))"; my @words = ($rec =~ /${b}[$w]+${b}/g);
I've tested this a little but not a lot, and it seems all right. You'll want to verify it yourself before you go using it for anything important :-)
-- Mike
--
XML::Simpler does not require XML::Parser or a SAX parser.
It does require File::Slurp.
-- grantm, perldoc XML::Simpler
In reply to Re: Re-define Word Boundary?
by thelenm
in thread Re-define Word Boundary?
by JimJ
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |