This needs adapting to handle the multi-character morphems in your Sanscrit example and a post process stage to convert morphed words to their lexicon spellings:

#! perl -slw use strict; use Data::Dump qw[ pp ]; sub deGlue (&\@\%@) { use re 'eval'; my $codeRef = shift; my $callback = sub { my $s = $_; my @words = map{ defined $-[ $_ ] && defined $+[ $_ ] ? substr( $s, $-[ $_ ], $+[ $_ ] - $-[ $_ ] ) : () } 1 .. $#-; $codeRef->( @words ); }; my @lex = @{ shift() }; my $morphRef = shift; for ( @lex ) { my( $pre, $last ) = m[(.*)(.)]; my $morph = $morphRef->{ $last } or next; $_ = "$pre(?:$last|$morph->[ 0 ](?=$morph->[1]))" } my $re = qr[ ^ (?:( ${ \ join( ')|(', @lex ) } ))+ $ (??{ $callback->() }) (?!) ]x; m[$re] for @_; return; } my %morphs = ( t => [ 'd' , 'd' ], ); my @lex = qw[cowboy cow boy cat do dog ]; my $input = 'cowboycaddog'; deGlue{ print join '-', @_ } @lex, %morphs, $input;

Produces:

c:\test>675520 cowboy-cad-dog cow-boy-cad-dog

A longer sample input with the related morphems and lexicon clearly identified (I don't know Sanscrit :), would allow better testing.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re: unglue words joined together by juncture rules by BrowserUk
in thread unglue words joined together by juncture rules by pc2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.