pc2 has asked for the wisdom of the Perl Monks concerning the following question:

salutations,

we would like your suggestions for the following problem.

suppose we have a string, like "cowboycaddog".

what we want is a way to separate the words of "cowboycaddog", observing (hipothetically) that, as an euphony rule of English, t is changed to d before d (t|d -> dd, cat + dog = caddog), considering a given list of isolated words, like:

cowboy cow boy cat do dog
that were possibly used to form the string.

t|d -> dd would be a juncture rule that was used to form the string.

thus, cowboycaddog and the above lexicon would output:
cowboy-cat-dog cow-boy-cat-dog
it would never output "cow-boy-cad-dog", because the word "cad" is not in the lexicon. this is just an example, there could be other juncture rules, for example: i|o -> ito, so the string "territory" and the morpheme list (terri, ory) would output terri-ory, for example.

is this possible? any suggestion is welcome.

thanks in advance.

Replies are listed 'Best First'.
Re: unglue words joined together by juncture rules
by roboticus (Chancellor) on Mar 21, 2008 at 20:57 UTC
    pc2:

    The bioinformatics crowd seems to do a lot of work around this sort of problem ... matching up fragments of DNA and genes and other squishy bits. You might look around and see what sort of things they're working on.

    ...roboticus

Re: unglue words joined together by juncture rules
by Narveson (Chaplain) on Mar 21, 2008 at 21:56 UTC

    Save your juncture rules in a hash.

    my %juncture_of = ( inm => 'imm', # immature abt => 'abst', # abstract adt => 'att', # attempt # etc. ); sub join_euphoniously { my $compound = join '', @_; while ( my ($raw, $joined) = each %juncture_of ) { $compound =~ s/$raw/$joined/; } return $compound; } sub unjoin { my $compound = shift; # maybe no juncture rules applied to this one my @possibilities = ( $compound ); # analyze whether $string resulted from a juncture rule while (my ($raw, $joined) = each %juncture_of ) { for my $possibility (@possibilities) { if ( $possibility =~ s/$joined/$raw/ ) { push @possibilities, $possibility } } } return \@possibilities; }

    Still to do: decompose each possibility into dictionary words.

      > Still to do: decompose each possibility into dictionary words.
      

      This just happens to be similar to something I've been fiddling with recently.

      #!/usr/bin/perl -w my $str = "cowboycatdog"; chomp(my @list = sort { length($b) <=> length($a) } <DATA>); for my $word (@list){ my $tmp = $str; next unless $tmp =~ s/$word//; my @results; push @results, $word; my @rem = grep ! /^$word$/, @list; for my $w (@rem){ push @results, $w if $tmp =~ s/$w//; } next if length($tmp); push @out, \@results; } print "'$str' has the following anagrams:\n\n", map "@$_\n", @out; __DATA__ cowboy cow boy cat do dog

      The output looks like this:

      'cowboycatdog' has the following anagrams:
      
      cowboy cat dog
      cow boy cat dog
      boy cow cat dog
      cat cowboy dog
      dog cowboy cat
      
Re: unglue words joined together by juncture rules
by pc2 (Beadle) on Mar 22, 2008 at 12:08 UTC

    thank you for the responses.

    more suggestions are welcome.

    let us explain the problem further. actually, what we are wanting to make is a word separator, based on a lexicon and a set of phonetic rules (of any kind, for example, I am -> I'm, vowel + vowel = long vowel (a + a = â; a + a = aya))

    an example of language that has these characteristics is Sanskrit. for example, "Krishnah" + "dhaavati" = "Krishnodhaavati", "namaH" + "te" = "namaste" (aH|dh -> odh; aH|t -> ast; as| -> aH (as at the end of the string turns to aH)), etc.

    for making things easier, the lexicons used to form the strings are generally quite small, and how we generate it is actually irrelevant for this problem. for example, krishnodhaavati would be analysed by means of the lexicon ('krishnaH', 'dhaavati', 'naH', 'dhaa') (the set of chunks of the string that exist in the dictionary), but how we generate this lexicon doesn't matter, we just want to join its words based on a set of phonetic rules (aH|dh -> odh, a|e -> aye, a|e -> ai, aH|t -> ast, aH|n -> on, for example) in order to obtain back the original string segmented in all its possibilities (namaH-te; krishnaH-dhaavati; namaH-namaH; etc.). of course, there will not always be only one possibility of segmentation, given a big quantity of juncture rules and a (relatively) big lexicon of possible morphemes.

      I am hoping that this is for a good cause, but the cynic in me is wondering if this is somehow related to making those google-spamming search websites. You know, they end up making ads for "Lowest prices for Cowboy Cat Dog at www.cowboy-cat-dog.com!" I'm glad that the free domain-tasting practice is falling out of favor with the registrars.

      --
      [ e d @ h a l l e y . c c ]

        funny answer.

        but we can assure you that it is definitely not what we want to do; we HATE when we receive those annoying spams and see those things in Google, it is like we are surrounded by bad people who only want to trick us and get away with this. it is just for a personal project of a morphological analyser.

        we are not mad at you, we perfectly understand your cynicism... it is hard those days to have good faith in humanity.

Re: unglue words joined together by juncture rules
by benizi (Hermit) on Mar 25, 2008 at 08:18 UTC

    The "right" way to do this is to use Finite State Transducers. They're used quite a bit in morphological analysis (deconstructing a word into its morphemes). I enjoyed Finite State Morphology, by Lauri Karttunen and Kenneth R. Beesley. A lot of the material you'll find will be very academic, and the field is a bit Finnish-heavy (It has far richer morphology than English). But one of the attractive features of the technology is its run-time efficiency. There are a couple widely-used toolkits: Xerox Finite State Toolkit, which comes with the book I mentioned above. (might have licensing issues). And the MIT FST Toolkit.

    Some relevant acronyms are WFST, FSA, FSM, and FST for weighted finite state transducers, finite state automata, finite state machines, and finite state transducers. I'm pretty sure Google has a toolkit that's relevant, but I can't seem to find it (I think it uses yet-another acronym for a class of machines that contains WFST's.)

    None of these is a Perl solution.

    Update: fixed Wikipedia link

    Update 2: The Google Research-related kit is OpenFST.

Re: unglue words joined together by juncture rules
by Gavin (Archbishop) on Mar 22, 2008 at 14:44 UTC
    Have you looked at the Prolog modules on CPAN
Re: unglue words joined together by juncture rules
by mobiusinversion (Beadle) on Apr 01, 2008 at 05:06 UTC
    Here it is. First, what you need to know:

    You will be responsible for installing your rules into the ungluer via a dispatch table of anonymous subroutines.

    Please note: I consider that to be intermediate Perl programming. So if you are looking for something trivial, stop reading now. These functions should return an array of arrays. In their simplest form, these functions could take in 1 word and return 2. Keep this in mind: You will be installing inverted associations.

    So instead of installing rules of the form:
    t|d => dd
    you will be installing rules of the form:
    dd => t|d
    The solution is in a subroutine (below) called functional_unglue and is called like this:
    my @results = functional_unglue( @arguments );
    Here is a template you should use:
    use strict; use Data::Dumper; my @results = functional_unglue ( target => TARGET_STRING, lexicon => { morpheme_1 => undef, morpheme_2 => undef, ... morpheme_n => undef, }, pre_images => { pre_image_function_1 => sub { CODE }, pre_image_function_2 => sub { CODE }, ... pre_image_function_n => sub { CODE }, } ); @results = map{join('-',@$_)}@results; print Dumper([@results]);
    If you think about it, that is the only way a solution makes sense; You want your models to be extensible and you want to be able to change your minds later, and so it must be up to you to install new rules.

    Here is what you get in return: This solution promises to apply all of your rule sets, and recover all of the possible ungluings, in the fastest and most memory efficient way possible, without any knowledge of the rules themselves.

    Here it is:
    sub functional_unglue { my %x = @_; my $x = $x{target}; my $l = $x{lexicon}; my $f = $x{pre_images}; my @q = ([$x]); my @r; while(@q){ my $t = shift @q; my @w = @{$t}; my $w = pop @w; for(keys %$f){ if(my @y = $f->{$_}->($w,$l,$f)){ result: for my $i(0..$#y){ my $n = $#{$y[$i]} == 0 ? 0 : $#{$y[$i]} - 1; for my $j(0..$n){ next result unless exists $l->{$y[$i][$j]} } push @q, [@w,@{$y[$i]}] } } } if(exists $l->{$t->[$#{$t}]}){ push @r, $t } } @r }
    So, for example, use it like this:
    use strict; use Data::Dumper; my @z = functional_unglue ( target => 'cowboycaddog', lexicon => { cowboy => undef, cow => undef, boy => undef, cat => undef, dog => undef, }, pre_images => { concat => sub { ### this turns one long word into two my($x,$l) = @_; my @x; for(keys %$l){ my $s = $_; if($x =~ /^\Q$s\E/){ push @x, [$s,substr($x,length($s))]; } } @x }, simple => sub { ### this turns "WORD1ddWord2" into [WORD1t, dWORD2] my $x = shift; my @x; my %x = ( dd => [ [qw(t d)], ], ); ### it can handle arbitrary substitions, ### not just dd => t|d, but XY => A|B ### for any strings X,Y,A,B for(keys %x){ my $s = $_; if($x =~ /\Q$s\E/){ for my $i(0..$#{$x{$s}}){ my @y = ( $`.$x{$s}->[$i][0], $x{$s}->[$i][1].$' ); push @x, length($y[1]) ? [@y] : [$y[0]] } } } @x }, } ); @z = map{join('-',@$_)}@z; print Dumper([@z]);
    Produces:
    $VAR1 = [ 'cowboy-cat-dog', 'cow-boy-cat-dog' ];
    Or you could try this:
    my @z = functional_unglue ( target => 'zivAzvaH', lexicon => { ziva => 1, azvas => 1, zivA => 1, Azvas => 1, }, pre_images => { simple => sub { my $x = shift; my @x; my %x = ( A => [[qw(a a)], [qw(A a)], [qw(a A)], ['A',''], ['','A'], [qw(A A)]], H => [['s','']], ); for(keys %x){ my $s = $_; if($x =~ /\Q$s\E/){ for my $i(0..$#{$x{$s}}){ my @y = ($`.$x{$s}->[$i][0], $x{$s}->[$i][1].$'); push @x, length($y[1]) ? [@y] : [$y[0]] } } } @x }, } ); @z = map{join('-',@$_)}@z; print Dumper([@z]);
    Produces:
    $VAR1 = [ 'ziva-azvas', 'zivA-azvas', 'ziva-Azvas', 'zivA-Azvas' ];
    Now why this works and how this works is your job to figure out. (It sounds like you need to get some more advanced books on Perl, try Object Oriented Perl and Higher Order Perl)

    Try the consonant transposition problem on your own. If you get stuck after trying for a few days, post back, and I'll show you how.

    Finally, look how easy it is to handle unlimited numbers of rules etc:
    use strict; use Data::Dumper; my @z = functional_unglue ( target => 'boycowboycaddogdodogcaddyeyescatdogboycaddyecatt', lexicon => { cowboy => undef, caddy => undef, boytoy => undef, co => undef, cow => undef, boy => undef, cat => undef, cats => undef, do => undef, dog => undef, dye => undef, eddie => undef, eyes => undef, kowtow => undef, toy => undef, tow => undef, tyco => undef, yes => undef, ye => undef, }, pre_images => { concat => sub { my($x,$l) = @_; my @x; for(keys %$l){ my $s = $_; if($x =~ /^\Q$s\E/){ push @x, [$s,substr($x,length($s))]; } } @x }, simple => sub { my $x = shift; my @x; my %x = ( dd => [[qw(t d)]], db => [['d','']], td => [[qw(d t)]], td => [[qw(d d)]], tt => [['t','']], ye => [[qw(eye eye)]], caddye => [[qw(cats eyes)]], ); for(keys %x){ my $s = $_; if($x =~ /\Q$s\E/){ for my $i(0..$#{$x{$s}}){ my @y = ( $`.$x{$s}->[$i][0], $x{$s}->[$i][1].$' ); push @x, length($y[1]) ? [@y] : [$y[0]] } } } @x }, } ); @z = map{join('-',@$_)}@z; print Dumper([@z]);
    Produces:


    'boy-cowboy-cat-dog-do-dog-caddy-eyes-cat-dog-boy-cats-eyes-cat',
    'boy-cowboy-cat-dog-do-dog-caddy-eyes-cat-dog-boy-cat-dye-cat',
    'boy-cow-boy-cat-dog-do-dog-caddy-eyes-cat-dog-boy-cats-eyes-cat',
    'boy-cow-boy-cat-dog-do-dog-caddy-eyes-cat-dog-boy-cat-dye-cat',
    'boy-cowboy-cat-dog-do-dog-cats-eyes-yes-cat-dog-boy-cats-eyes-cat',
    'boy-cowboy-cat-dog-do-dog-cats-eyes-yes-cat-dog-boy-cat-dye-cat',
    'boy-cowboy-cat-dog-do-dog-cat-dye-yes-cat-dog-boy-cats-eyes-cat',
    'boy-cowboy-cat-dog-do-dog-cat-dye-yes-cat-dog-boy-cat-dye-cat',
    'boy-cow-boy-cat-dog-do-dog-cats-eyes-yes-cat-dog-boy-cats-eyes-cat',
    'boy-cow-boy-cat-dog-do-dog-cats-eyes-yes-cat-dog-boy-cat-dye-cat',
    'boy-cow-boy-cat-dog-do-dog-cat-dye-yes-cat-dog-boy-cats-eyes-cat',
    'boy-cow-boy-cat-dog-do-dog-cat-dye-yes-cat-dog-boy-cat-dye-cat'


    In under 1/100th of a second.

    Happy hunting!
Re: unglue words joined together by juncture rules
by pc2 (Beadle) on Mar 26, 2008 at 16:10 UTC

    thank you for the answers.

    unfortunately, we don't have resources for researching about finite state transducers (which we have already heard about).

    we are trying to find a solution. we have already posted a question similar to this one (http://www.perlmonks.org/?node_id=658691), and the user GrandFather has given a very interesting partial solution (which uses a recursive search to find all matching combinations of cowboycatdog, based on the lexicon cowboy, cow, boy, cat, at, do, dog):

    use strict; use warnings; my $target = "cowboycatdog"; my @partsList = qw(cowboy cow boy cat at do dog); my %partsLu; ++$partsLu{$_} for @partsList; search ($target, {%partsLu}, []); sub search { my ($target, $partsLu, $used) = @_; unless (length $target) { print join ("-", @$used), "\n"; return; } for my $part (keys %$partsLu) { next unless 0 == index $target, $part; my $remainder = substr $target, length $part; delete $partsLu->{$part} unless --$partsLu->{$part}; search ($remainder, {%$partsLu}, [@$used, $part]); } }
    which prints:

    cow-boy-cat-dog

    cowboy-cat-dog

    the only limitation with this method regarding this problem is that it doesn't consider that the words may have been joined together by phonetic rules (for example, cowboycaddog).

    so, would it be a good idea trying to adapt the above code, or creating one that uses a similar method? any suggestions?

      You need only one more thing I think. And it is look-ahead assertion. Am I right, thinking, that you want match if next letter after 'cad' is 'd' or next phrase after 'namaH' is 'te'?

      #!perl -l use strict; use warnings; my $target = "cowboycaddog"; my @partsList = ('cowboy', 'cow', 'boy', 'cat', 'at', 'do', 'dog', 'ca +d(?=d)'); my %partsLu; ++$partsLu{$_} for @partsList; search ($target, {%partsLu}, []); sub search { my ($target, $partsLu, $used) = @_; unless (length $target) { print join ("-", @$used), "\n"; return; } for my $part (keys %$partsLu) { my $tmp = $target; my $re = qr/$part/; next unless $tmp=~s/^$re//; delete $partsLu->{$part} unless --$partsLu->{$part}; search ($tmp, {%$partsLu}, [@$used, $part]); } }

      Update: Of course strings printed as result will include these assertions, but the problem how to remove (?=something) from output IMHO can be left as an exercise for the reader (always wanted to say that :) )

Re: unglue words joined together by juncture rules
by mobiusinversion (Beadle) on Mar 28, 2008 at 02:16 UTC
    Consider the word:
    'cowboycaddogcowcowboyboycatdodogcaddodocowboytoycaddye'
    the alphabet:
    cowboy caddy boytoy co cow boy cat cats do dog dye eddie eyes kowtow toy tow tyco
    and the juncture rules:
    t|d => dd cats|eyes => caddye
    There are 24 ungluings and I am posting them here. The algorithm I used is of a branch-and-bound genus, and employs a breadth first search on the decision space.

    Here are the details: Potential *ungluings* are added to a queue. Elements of the queue are rejected when no possible ungluings can be formed on their right hand side (i.e. their as-of-yet-not-unlgued-bit). The solution and runtime stats for this example are as follows:


    0.0215 seconds

    143 iterations

    largest queue memory usage: 15,612 bytes

    largest queue length: 24 items

        cow-boy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cow-boy-toy-cats-eyes
        cowboy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cow-boy-toy-cats-eyes
        cow-boy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cow-boy-toy-cats-eyes
        cow-boy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cowboy-toy-cats-eyes
        cow-boy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cow-boytoy-cats-eyes
        cowboy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cow-boy-toy-cats-eyes
        cowboy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cowboy-toy-cats-eyes
        cowboy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cow-boytoy-cats-eyes
        cow-boy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cowboy-toy-cats-eyes
        cow-boy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cow-boytoy-cats-eyes
        cow-boy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cow-boy-toy-cat-dye
        cowboy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cowboy-toy-cats-eyes
        cowboy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cow-boytoy-cats-eyes
        cowboy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cow-boy-toy-cat-dye
        cow-boy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cow-boy-toy-cat-dye
        cow-boy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cowboy-toy-cat-dye
        cow-boy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cow-boytoy-cat-dye
        cowboy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cow-boy-toy-cat-dye
        cowboy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cowboy-toy-cat-dye
        cowboy-cat-dog-cow-cow-boy-boy-cat-do-dog-cat-do-do-cow-boytoy-cat-dye
        cow-boy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cowboy-toy-cat-dye
        cow-boy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cow-boytoy-cat-dye
        cowboy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cowboy-toy-cat-dye
        cowboy-cat-dog-cow-cowboy-boy-cat-do-dog-cat-do-do-cow-boytoy-cat-dye


    The code is relatively simple. Say the word and Ill post it. (In case you'd like to have the crack at it yourself).

    Could you share an actual example of your problem with us?

      salutations,

      thank you for the answer, but we are not sure we understood what you meant in the penultimate line of your answer.

      EDIT (Mar 31): in the expression between parenthesis ("In case you'd like to have the crack at it yourself").

        Penultimate, meaning second to last?
        Online Etymology Dictionary penultimate (adj.) 1677, from earlier penultima (n.) "the next to the last syllable of a +word or verse," from fem. of L. adj. penultimus "next-to-last," from +pæne "almost" + ultimus "final." Online Etymology Dictionary, - Douglas Harper

        So "say the word and Ill post it" was unclear???

        If thats what you you are referring to than I meant that if you wanted the code that solves your problem, Id give it to you if you said so. I just didnt want to spoil the fun (the description of the code in the first few paragraphs is more than enough to solve it in under an hour)

        edit (april 01):
        regarding "to have a crack at it", that is both american and australian slang meaning "to try something". You can see the urban dictionary definition. Other tagged definitions in the urban dictionary include:

            give it a go
            have a try
            jump right in
            have a go
            to try
Re: unglue words joined together by juncture rules
by BrowserUk (Patriarch) on Mar 27, 2008 at 14:25 UTC

    This needs adapting to handle the multi-character morphems in your Sanscrit example and a post process stage to convert morphed words to their lexicon spellings:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; sub deGlue (&\@\%@) { use re 'eval'; my $codeRef = shift; my $callback = sub { my $s = $_; my @words = map{ defined $-[ $_ ] && defined $+[ $_ ] ? substr( $s, $-[ $_ ], $+[ $_ ] - $-[ $_ ] ) : () } 1 .. $#-; $codeRef->( @words ); }; my @lex = @{ shift() }; my $morphRef = shift; for ( @lex ) { my( $pre, $last ) = m[(.*)(.)]; my $morph = $morphRef->{ $last } or next; $_ = "$pre(?:$last|$morph->[ 0 ](?=$morph->[1]))" } my $re = qr[ ^ (?:( ${ \ join( ')|(', @lex ) } ))+ $ (??{ $callback->() }) (?!) ]x; m[$re] for @_; return; } my %morphs = ( t => [ 'd' , 'd' ], ); my @lex = qw[cowboy cow boy cat do dog ]; my $input = 'cowboycaddog'; deGlue{ print join '-', @_ } @lex, %morphs, $input;

    Produces:

    c:\test>675520 cowboy-cad-dog cow-boy-cad-dog

    A longer sample input with the related morphems and lexicon clearly identified (I don't know Sanscrit :), would allow better testing.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: unglue words joined together by juncture rules
by pc2 (Beadle) on Mar 30, 2008 at 14:25 UTC

    salutations,

    we shall give an actual example of the problem we are trying to solve, based on Sanskrit (which is the best language we can think of for this particular problem, for the many euphony rules it has). consider the lexicon of wordforms (which could be in the form of a hash, with associated meaning) where letter "A" (long vowel) is different from letter "a" (short vowel):

    ziva => Shiva (a name for god) azvas => horse zivA => auspicious (f.) Azvas => equestrian
    also note that the words are in isolated forms, i. e. without any juncture rules.

    consider the following word: zivAzvaH

    and the phonetic rules, which occur between words and/or in the final of the sentence:
    a|a => A A|a => A a|A => A A| => A |A => A A|A => A s| => H
    so, for example, ziva + azvas would give zivAzvaH. zivA + azvas would give zivAzvaH. zivA + Azvas would give zivAzvaH. thus, the possible segmentations of zivAzvaH would be:
    ziva-azvas #meaning: Shiva's horse zivA-azvas #meaning: auspicious' horse ziva-Azvas #meaning: Shiva's equestrian zivA-Azvas #meaning: auspicious' equestrian
    of course, we only want to separate the possible words; whether it makes sense or not in the language is another story.

    there is yet another example of something we want it to be able to do (NOTE: this second example may be left as something to work on later, maybe): consider a language (which is actually what we are willing to experiment) with a word "abaca", and which has the following rules for joining words:

    a + a = A. last consonant of first word + first consonant of last word swap.
    exemplifying:
    abaca + abaca = abaCa + aBaca = abaBa + aCaba (consonants swap) ababAcaba (final form)
    we would like to analyse "ababAcaba" and get:
    abaca-abaca
    this second situation seems much more complicated, but is not prioritary, maybe we should first concentrate on the first one.
      By the way, if anyone else is interested in transliterated Sanskrit, this post is most likely regarding the Harvard-Kyoto standard, which can be found here

      These are the input to and outputs from my code somewhere above, for the 3 examples you;ve supplied so far:

      { my %morphs = ( t => { d => 'd' }, ); my @lex = qw[ cowboy cow boy cat do dog ]; my $input = 'cowboycaddog'; print "\n$input\n------------"; deGlue{ print join '-', @_ } @lex, %morphs, $input; } { my %morphs = ( aH => { o => 'dh', as => 't' }, as => { aH => '' }, ); my @lex = qw[ krishnaH dhaavati naH dhaa namaH te ]; my $input = 'Krishnodhaavatinamaste'; print "\n$input\n----------"; deGlue{ print join '-', @_ } @lex, %morphs, $input; } { my %morphs = ( A => { a => 'a', A => 'a', a => 'A', A => 'A', '' => 'A', 'A' +=> '' }, s => { H => '' }, ); my @lex = qw[ ziva Shiva azvas zivA Azvas ]; my $input = 'zivAzvaH'; print "\n$input\n-------------"; deGlue{ print join '-', @_ } @lex, %morphs, $input; } __END__ c:\test>675520 cowboycaddog ------------ cowboy-cad-dog cow-boy-cad-dog Krishnodhaavatinamaste ---------- Krishno-dhaavati-namas-te zivAzvaH ------------- ziv-AzvaH ziv-AzvaH-AzvaH ## I'M investigating this anomoly.

      The main point of that code is that it constructs regexes to parse the data from the supplied lexicon and morpheme rules automatically.

      Incomplete yet, and currently leave work still to be done, but a starting point? The more examples it is tried with, the better the code generation can be tailored.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        salutations,

        thank you for the answer. we will analyze your code, it seems to be a good starting point. of course, it will take some time for us because our Perl knowledge is not very advanced yet.
      Would you simply state the problem you are trying to solve, rather than progressively complex examples?

      Do that and Ill post the solution ;)

        salutations,

        thank you for the attention.

        the problem is that this problem seems difficult to explain without examples. but that examples we gave (the Sanskrit one and the abacAcaba one) are actually what we want to make. so, we thought it would be easier to formulate with several examples (of course, we were wrong, because several complex examples make it difficult to give only one solution that solves everything, right?).

        trying to state the actual problem, what we want is to be able to take a string of words (any words) joined by whatever rules of combination we may want to create between words (vowel joining, additional euphonic phoneme, consonant swapping, assimilation...) and then separate this string into the possible combinations of words and rules that may have formed it. maybe, based on a lexicon of the isolated wordforms that may have formed the phrase.

        do you have a solution for it?