salutations,

we shall give an actual example of the problem we are trying to solve, based on Sanskrit (which is the best language we can think of for this particular problem, for the many euphony rules it has). consider the lexicon of wordforms (which could be in the form of a hash, with associated meaning) where letter "A" (long vowel) is different from letter "a" (short vowel):

ziva => Shiva (a name for god) azvas => horse zivA => auspicious (f.) Azvas => equestrian
also note that the words are in isolated forms, i. e. without any juncture rules.

consider the following word: zivAzvaH

and the phonetic rules, which occur between words and/or in the final of the sentence:
a|a => A A|a => A a|A => A A| => A |A => A A|A => A s| => H
so, for example, ziva + azvas would give zivAzvaH. zivA + azvas would give zivAzvaH. zivA + Azvas would give zivAzvaH. thus, the possible segmentations of zivAzvaH would be:
ziva-azvas #meaning: Shiva's horse zivA-azvas #meaning: auspicious' horse ziva-Azvas #meaning: Shiva's equestrian zivA-Azvas #meaning: auspicious' equestrian
of course, we only want to separate the possible words; whether it makes sense or not in the language is another story.

there is yet another example of something we want it to be able to do (NOTE: this second example may be left as something to work on later, maybe): consider a language (which is actually what we are willing to experiment) with a word "abaca", and which has the following rules for joining words:

a + a = A. last consonant of first word + first consonant of last word swap.
exemplifying:
abaca + abaca = abaCa + aBaca = abaBa + aCaba (consonants swap) ababAcaba (final form)
we would like to analyse "ababAcaba" and get:
abaca-abaca
this second situation seems much more complicated, but is not prioritary, maybe we should first concentrate on the first one.

In reply to Re: unglue words joined together by juncture rules by pc2
in thread unglue words joined together by juncture rules by pc2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.