These are the input to and outputs from my code somewhere above, for the 3 examples you;ve supplied so far:
{ my %morphs = ( t => { d => 'd' }, ); my @lex = qw[ cowboy cow boy cat do dog ]; my $input = 'cowboycaddog'; print "\n$input\n------------"; deGlue{ print join '-', @_ } @lex, %morphs, $input; } { my %morphs = ( aH => { o => 'dh', as => 't' }, as => { aH => '' }, ); my @lex = qw[ krishnaH dhaavati naH dhaa namaH te ]; my $input = 'Krishnodhaavatinamaste'; print "\n$input\n----------"; deGlue{ print join '-', @_ } @lex, %morphs, $input; } { my %morphs = ( A => { a => 'a', A => 'a', a => 'A', A => 'A', '' => 'A', 'A' +=> '' }, s => { H => '' }, ); my @lex = qw[ ziva Shiva azvas zivA Azvas ]; my $input = 'zivAzvaH'; print "\n$input\n-------------"; deGlue{ print join '-', @_ } @lex, %morphs, $input; } __END__ c:\test>675520 cowboycaddog ------------ cowboy-cad-dog cow-boy-cad-dog Krishnodhaavatinamaste ---------- Krishno-dhaavati-namas-te zivAzvaH ------------- ziv-AzvaH ziv-AzvaH-AzvaH ## I'M investigating this anomoly.
The main point of that code is that it constructs regexes to parse the data from the supplied lexicon and morpheme rules automatically.
Incomplete yet, and currently leave work still to be done, but a starting point? The more examples it is tried with, the better the code generation can be tailored.
In reply to Re^2: unglue words joined together by juncture rules
by BrowserUk
in thread unglue words joined together by juncture rules
by pc2
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |