in reply to Splitting compound (concatenated) words )
If all the words are correctly spelt and are in your dictionary, then this appears to make a good attempt at many inputs:
#! perl -slw use strict; my @w = do{ local @ARGV = 'words.txt'; <> }; chomp @w; my $s = 'couldsomeonerecommendaworkingperlmoduletosplitconcatenatedwor +ds'; my @subset = grep{ $s =~ /$_/ } 'a', 'perl', @w; my $re1 = join '|', sort{ length( $b ) <=> length( $a ) }@subset; my $re2 = "($re1)?" x 11; print for grep defined(), $s =~ /^$re2$/; __END__ C:\test>junk could someone recommend a working perl module to split concatenated words
But note: I had to add 'a' & 'perl' which don't appear in my dictionary; and I cheated by hardwiring the number of words (11) to look for.
If I change that to a larger number (say 100), then the results are less good:
C:\test>junk could someone recommend aw or king perl module to split concatenated words
However, if I removed the iffy non-word 'aw' from my rather permissive dictionary, it once again produces the right output, but that just goes to prove how sensitive and dependent the results would be on a good dictionary and correct spelling.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Splitting compound (concatenated) words )
by vit (Friar) on May 15, 2012 at 23:49 UTC | |
by BrowserUk (Patriarch) on May 16, 2012 at 00:44 UTC | |
|
Re^2: Splitting compound (concatenated) words )
by vit (Friar) on May 16, 2012 at 22:06 UTC | |
by BrowserUk (Patriarch) on May 16, 2012 at 22:50 UTC | |
by vit (Friar) on May 16, 2012 at 23:39 UTC | |
by BrowserUk (Patriarch) on May 17, 2012 at 00:24 UTC | |
by vit (Friar) on May 17, 2012 at 14:34 UTC | |
|