in reply to Re^2: unglue words joined together by juncture rules
in thread unglue words joined together by juncture rules

salutations,

thank you for the attention.

the problem is that this problem seems difficult to explain without examples. but that examples we gave (the Sanskrit one and the abacAcaba one) are actually what we want to make. so, we thought it would be easier to formulate with several examples (of course, we were wrong, because several complex examples make it difficult to give only one solution that solves everything, right?).

trying to state the actual problem, what we want is to be able to take a string of words (any words) joined by whatever rules of combination we may want to create between words (vowel joining, additional euphonic phoneme, consonant swapping, assimilation...) and then separate this string into the possible combinations of words and rules that may have formed it. maybe, based on a lexicon of the isolated wordforms that may have formed the phrase.

do you have a solution for it?

  • Comment on Re^3: unglue words joined together by juncture rules

Replies are listed 'Best First'.
Re^4: unglue words joined together by juncture rules
by mobiusinversion (Beadle) on Apr 01, 2008 at 04:21 UTC
    ill provide it shortly. please stay tuned.

    in the meanwhile, i recommend getting comfortable with Perl's regular expression variables, and the qr// operator (very useful!). Here is an excerpt from the Perl 5.10 documentation.
    VARIABLES $_ Default variable for operators to use $` Everything prior to matched string $& Entire matched string $' Everything after to matched string $1,$2... Hold the Xth captured expr $+ Last parenthesized pattern match $^N Holds the most recently closed capture $^R Holds the result of the last (?{...}) expr @- Offsets of starts of groups. @+ Offsets of ends of groups.
    Here is an application you will need:
    use strict; use Data::Dumper; my $con = qr/[b-df-hj-np-tv-xz]/; my $vow = qr/[aeiouy]/; my $ncon = qr/[^b-df-hj-np-tv-xz]/; my $nvow = qr/[^aeiouy]/; my $x = 'battlestar galactica'; my $y = 'silly ahab'; ($x,$y) = map{swap($_)}($x,$y); print Dumper([$x,$y]); sub swap { my $x = shift; if($x =~ /(${con})(${ncon}*?\b)(${ncon}*?)(${con})/){ $x = $`.$4.$2.$3.$1.$'; } $x }
    Produces:
    $VAR1 = [ 'battlestag ralactica', 'silhy alab' ];