comment on

Here it is. First, what you need to know:

You will be responsible for installing your rules into the ungluer via a dispatch table of anonymous subroutines.

Please note: I consider that to be intermediate Perl programming. So if you are looking for something trivial, stop reading now. These functions should return an array of arrays. In their simplest form, these functions could take in 1 word and return 2. Keep this in mind: You will be installing inverted associations.

So instead of installing rules of the form:

t|d => dd
[download]

you will be installing rules of the form:

dd => t|d
[download]

The solution is in a subroutine (below) called functional_unglue and is called like this:

my @results = functional_unglue( @arguments );
[download]

Here is a template you should use:

use strict;
use Data::Dumper;

my @results = functional_unglue (
    target     => TARGET_STRING,
    lexicon    => {
                    morpheme_1 => undef,
                    morpheme_2 => undef,
                    ...
                    morpheme_n => undef,
                  },
    pre_images => {
        pre_image_function_1 => sub { CODE },
        pre_image_function_2 => sub { CODE },
        ...
        pre_image_function_n => sub { CODE },
    }
);

@results = map{join('-',@$_)}@results;

print Dumper([@results]);
[download]

If you think about it, that is the only way a solution makes sense; You want your models to be extensible and you want to be able to change your minds later, and so it must be up to you to install new rules.

Here is what you get in return: This solution promises to apply all of your rule sets, and recover all of the possible ungluings, in the fastest and most memory efficient way possible, without any knowledge of the rules themselves.

Here it is:

sub functional_unglue {
    my %x = @_;
    my $x = $x{target};     
    my $l = $x{lexicon};    
    my $f = $x{pre_images}; 
    my @q = ([$x]);
    my @r;
    while(@q){
        my $t = shift @q;
        my @w = @{$t};
        my $w = pop @w;
        for(keys %$f){
            if(my @y = $f->{$_}->($w,$l,$f)){
                result:
                for my $i(0..$#y){        
                    my $n = $#{$y[$i]} == 0 ? 0 : $#{$y[$i]} - 1;
                    for my $j(0..$n){
                        next result unless exists $l->{$y[$i][$j]}
                    }
                    push @q, [@w,@{$y[$i]}]
                }
            }
        }
        if(exists $l->{$t->[$#{$t}]}){
            push @r, $t
        } 
    } @r
}
[download]

So, for example, use it like this:

use strict;
use Data::Dumper;

my @z = functional_unglue (
    target     => 'cowboycaddog',
    lexicon    => {
                    cowboy => undef,
                    cow    => undef,
                    boy    => undef,
                    cat    => undef,
                    dog    => undef,
                  },
    pre_images => {
        concat => sub {   
            ### this turns one long word into two
            my($x,$l) = @_;
            my @x;
            for(keys %$l){
                my $s = $_;
                if($x =~ /^\Q$s\E/){
                    push @x, [$s,substr($x,length($s))];
                }
            } @x
        },
        simple => sub {  
            ### this turns "WORD1ddWord2" into [WORD1t, dWORD2]
            my $x = shift;
            my @x;
            my %x = (
                dd => [
                    [qw(t d)],
                ],
            );
            ### it can handle arbitrary substitions, 
            ### not just dd => t|d, but XY => A|B 
            ### for any strings X,Y,A,B 
            for(keys %x){
                my $s = $_;
                if($x =~ /\Q$s\E/){
                    for my $i(0..$#{$x{$s}}){
                        my @y = (
                            $`.$x{$s}->[$i][0],
                            $x{$s}->[$i][1].$'
                        );   
                        push @x, length($y[1]) ? [@y] : [$y[0]]
                    }
                }
            } @x
        },
    }
);
@z = map{join('-',@$_)}@z;
print Dumper([@z]);
[download]

Produces:

$VAR1 = [
          'cowboy-cat-dog',
          'cow-boy-cat-dog'
        ];
[download]

Or you could try this:

my @z = functional_unglue (
    target     => 'zivAzvaH',
    lexicon    => {
        ziva  => 1, azvas => 1,
        zivA  => 1, Azvas => 1,
    },
    pre_images => {
        simple => sub {
            my $x = shift;
            my @x;
            my %x = (
                A => [[qw(a a)], [qw(A a)], [qw(a A)],
                      ['A',''],  ['','A'],  [qw(A A)]],
                H => [['s','']],
            );
            for(keys %x){
               my $s = $_;
               if($x =~ /\Q$s\E/){
                  for my $i(0..$#{$x{$s}}){
                     my @y = ($`.$x{$s}->[$i][0], $x{$s}->[$i][1].$');
                        push @x, length($y[1]) ? [@y] : [$y[0]]
                    }
                }
            } @x
        },
    }
);
@z = map{join('-',@$_)}@z;
print Dumper([@z]);
[download]

Produces:

$VAR1 = [
          'ziva-azvas',
          'zivA-azvas',
          'ziva-Azvas',
          'zivA-Azvas'
        ];
[download]

Now why this works and how this works is your job to figure out. (It sounds like you need to get some more advanced books on Perl, try Object Oriented Perl and Higher Order Perl)

Try the consonant transposition problem on your own. If you get stuck after trying for a few days, post back, and I'll show you how.

Finally, look how easy it is to handle unlimited numbers of rules etc:

use strict;
use Data::Dumper;

my @z = functional_unglue (
    target     => 
        'boycowboycaddogdodogcaddyeyescatdogboycaddyecatt',
    lexicon    => {
        cowboy => undef,
        caddy  => undef,
        boytoy => undef,
        co     => undef,
        cow    => undef,
        boy    => undef,
        cat    => undef,
        cats   => undef,
        do     => undef,
        dog    => undef,
        dye    => undef,
        eddie  => undef,
        eyes   => undef,
        kowtow => undef,
        toy    => undef,
        tow    => undef,
        tyco   => undef,
        yes    => undef,
        ye     => undef,
    },
    pre_images => {
        concat => sub {
            my($x,$l) = @_;
            my @x;
            for(keys %$l){
                my $s = $_;
                if($x =~ /^\Q$s\E/){
                    push @x, [$s,substr($x,length($s))];
                }
            } @x
        },
        simple => sub {
            my $x = shift;
            my @x;
            my %x = (
                dd     => [[qw(t d)]],
                db     => [['d','']],
                td     => [[qw(d t)]],
                td     => [[qw(d d)]],
                tt     => [['t','']],
                ye     => [[qw(eye  eye)]],
                caddye => [[qw(cats eyes)]],
                
            );
            for(keys %x){
                my $s = $_;
                if($x =~ /\Q$s\E/){
                    for my $i(0..$#{$x{$s}}){
                        my @y = (
                            $`.$x{$s}->[$i][0],
                            $x{$s}->[$i][1].$'
                        );
                        push @x, length($y[1]) ? [@y] : [$y[0]]
                    }
                }
            } @x
        },
    }
);
@z = map{join('-',@$_)}@z;
print Dumper([@z]);
[download]

Produces:

'boy-cowboy-cat-dog-do-dog-caddy-eyes-cat-dog-boy-cats-eyes-cat',
'boy-cowboy-cat-dog-do-dog-caddy-eyes-cat-dog-boy-cat-dye-cat',
'boy-cow-boy-cat-dog-do-dog-caddy-eyes-cat-dog-boy-cats-eyes-cat',
'boy-cow-boy-cat-dog-do-dog-caddy-eyes-cat-dog-boy-cat-dye-cat',
'boy-cowboy-cat-dog-do-dog-cats-eyes-yes-cat-dog-boy-cats-eyes-cat',
'boy-cowboy-cat-dog-do-dog-cats-eyes-yes-cat-dog-boy-cat-dye-cat',
'boy-cowboy-cat-dog-do-dog-cat-dye-yes-cat-dog-boy-cats-eyes-cat',
'boy-cowboy-cat-dog-do-dog-cat-dye-yes-cat-dog-boy-cat-dye-cat',
'boy-cow-boy-cat-dog-do-dog-cats-eyes-yes-cat-dog-boy-cats-eyes-cat',
'boy-cow-boy-cat-dog-do-dog-cats-eyes-yes-cat-dog-boy-cat-dye-cat',
'boy-cow-boy-cat-dog-do-dog-cat-dye-yes-cat-dog-boy-cats-eyes-cat',
'boy-cow-boy-cat-dog-do-dog-cat-dye-yes-cat-dog-boy-cat-dye-cat'

In under 1/100th of a second.

Happy hunting!

In reply to Re: unglue words joined together by juncture rules by mobiusinversion
in thread unglue words joined together by juncture rules by pc2

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.