in reply to Re^2: Regex word boundries
in thread Regex word boundries
One thing I did notice in your code (and after runnning) was that the word count seems to be just 1.
Sorry,
my $word_count = () = split(' ', $file);
should be
my $word_count = split(' ', $file);
I now have another problem in that some terms are still not picked up
\b matches between \w\W, \W\w, ^\w and \w\z. As such, the second \b won't match in 'h(2)O(2) water' =~ '/\b\Qh(2)O(2)\Q\b/. () is a \W, and so is the following space.) Perhaps this will do the trick:
/(?:\W|^)\Q$term\E(?:(?=\W)|\z)/
I think the following would be faster, but it would count a repeated term as one:
/(?:\W|^)\Q$term\E(?:\W|\z)/
If you want the match to be case-insensitive, one solution is to use the i modifier on your match.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Regex word boundries
by MonkPaul (Friar) on Oct 29, 2007 at 15:24 UTC | |
by ikegami (Patriarch) on Oct 29, 2007 at 15:55 UTC |