in reply to Re^2: removing stop words
in thread removing stop words
In fact the list given above would condense much further as about and above both share the abo prefix, and after and afterwards both share the after prefix, etc..
If you execute the following elisp command in Emacs:
(regexp-opt '("a" "about" "above" "across" "after" "afterwards"))
You get:
"a\\(?:bo\\(?:ut\\|ve\\)\\|cross\\|fter\\(?:wards\\)?\\)?"
..which should be a much more efficient search expression.
Emacs uses the double backslash to escape characters.. so in Perl the same optimised regexp looks like this:
"a(?:bo(?:ut|ve)|cross|fter(?:wards)?)?"
Has anyone written an equivalent module in perl to optimise list searches in regexps the way Emacs has in LISP?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: removing stop words
by fishbot_v2 (Chaplain) on May 29, 2005 at 15:34 UTC |