in reply to Split a sentence into words

my @vocabulary = qw(abd abcd abc a bc); my $sentence = 'abdaabc'; my $pattern = join '|', @vocabulary; my @words = $sentence =~ /($pattern)/g;

note that @vocabulary has to be sorted in such a way that "longer" words come earlier; i.e. if word x is a prefix of word y, word y must come earlier in the list.

Upd Does not actually work; i.e. it works only for some vocabularies; say (abcd, abc, de) will not split 'abcde' right. Things get complicated and computer-sciencey. See bart and ikegami's replies below.

Replies are listed 'Best First'.
Re^2: Split a sentence into words
by bart (Canon) on May 30, 2009 at 12:27 UTC
    note that @vocabulary has to be sorted in such a way that "longer" words come earlier
    If you depend on a module like Regex::PreSuf, not only will it have the same effect, i.e. matching the longest match possible, but likely, it'll match faster, at least for longer lists, and pre 5.10 perl.