in reply to variables in regex character classes
Consider the following code (i can give an example in Cyrillic-Windows-1251, but i don't know if it's compatible with Belorussian variant):
The code is untested, but must do the Right Thing in much better way than yours. It re-uses the generated regexes and may make processing of large amounts of data noticeably faster.my $letters = '[a-zA-Z]'; # try - maybe the character range will work +for you. It works in Cyr-1521, but does not in KOI8-R my @wordpatterns = map { qr/(?<!$letters)(\Q$_\E$letters*)/ } qw(tak h +et hen toj); while (my $next_line = <$FH>) { foreach my $pattern (@wordpatterns) { my $count_words = ($next_line =~ s/$pattern/>$1</gi); } }
One more note—the /i switch won't work for Cyrillic encodings without carefully set locale. The behaviur of boundaries (\b) is wrong, if the locale is wrong—so i removed them from my regex too.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: variables in regex character classes
by amir_e_a (Hermit) on Jul 23, 2006 at 14:37 UTC | |
by Ieronim (Friar) on Jul 23, 2006 at 17:32 UTC |