pc2 has asked for the wisdom of the Perl Monks concerning the following question:

salutations, we are not sure on how to explain it, but, suppose we have a string without spaces, which is a succession of words, for example: "cowboycatdog", and we looped through all the possible fragments of the words, checking its existence in an indexed dictionary. then, it returns an array with all the existing fragments of the string:
cow cowboy boy cat do dog
notice that every word here exists. we already have the algorithm to do all that, but here is the doubt: how can we combine the items of the newly generated array in order to form the string "cowboycatdog" again? for example:
cow-boy-cat-dog cowboy-cat-dog
would be the possible segmentations of this string, based on that array. the segmentation "cow-boy-cat-do-g", for example, wouldn't exist, because there is no word "g" in the array. (this is just a small example: the string and the fragments meaning anything could be much bigger). does anyone have any idea on how to to this? thank you in advance.

Replies are listed 'Best First'.
Re: combining array items to match a certain string
by GrandFather (Saint) on Dec 22, 2007 at 20:56 UTC

    The following uses a recursive search to find all matching combinations:

    use strict; use warnings; my $target = "cowboycatdog"; my @parts = qw(cow cowboy boy cat at do dog); search ($target, [@parts], []); sub search { my ($target, $parts, $used) = @_; unless (length $target) { print join ("-", @$used), "\n"; return; } for my $part (@$parts) { next unless 0 == index $target, $part; my $remainder = substr $target, length $part; search ($remainder, [grep {$part ne $_} @$parts], [@$used, $pa +rt]); } }

    Prints:

    cow-boy-cat-dog cowboy-cat-dog

    Perl is environmentally friendly - it saves trees
Re: combining array items to match a certain string
by quester (Vicar) on Dec 22, 2007 at 20:40 UTC
    Try this:

    Write a procedure that tries to match each of the words in the list, only at the very beginning of the string. If there is a match, the procedure calls itself to match the remainder of the string, otherwise it just returns. Except... if the remainder of the string is zero length, print the segments that matched.

    The parameters of the procedure could be the sequence of words that have have been matched so far separated by dashes, and the remaining part of the string.

    For your example, you would start with

    try_match( "", "cowboycatdog" )
    which would, in one of the intermediate steps, call itself with the parameters
    try_match( "cow-boy", "catdog" );
    and later,
    try_match( "cow-boy-cat-dog", "" );
    Assuming the number of different words in the list is about proportional to the string length, the run time should be of the order of the cube of the length of the string, I think.
Re: combining array items to match a certain string
by jwkrahn (Abbot) on Dec 22, 2007 at 22:22 UTC
    $ perl -le' my @array = qw/ cow cowboy boy cat do dog /; my $string = join q//, @array; print $string; 1 while $string =~ s/(\w{2,})\1/$1/; print $string; ' cowcowboyboycatdodog cowboycatdog
Re: combining array items to match a certain string
by poolpi (Hermit) on Dec 22, 2007 at 21:39 UTC
    <<how can we combine the items of the newly generated array in order to form the string "cowboycatdog" again?>>

    If you generate a hash of arrays like this
    my $h = { cowboy => [ cow, boy ], cat => undef, dog => undef }
    you can easaly find you string again?

    HTH,

    PooLPi
Re: combining array items to match a certain string
by pc2 (Beadle) on Dec 22, 2007 at 22:50 UTC
    salutations, thank you for the replies. quester, your idea is good, but, just when we were trying to implement it, GrandFather implemented it to us (we think it is the same principle, isn't it?). anyway, it worked very well, just the way we wanted. poolpi, what you said is true, but the algorithm we use can't be adapted for this. anyway, problem solved, thank you all. GrandFather, if it isn't much trouble, could you explain us how your algorithm really operates?

      First turn on strictures (you always use strictures don't you?), then set up the target string and the search list. Perform the search using the starting parameters.

      use strict; use warnings; my $target = "cowboycatdog"; my @parts = qw(cow cowboy boy cat at do dog); search ($target, [@parts], []);

      Assign the parameters passed to the sub to local variables. Note that the arrays are passed as references so that two arrays can be passed.

      sub search { my ($target, $parts, $used) = @_;

      Check to see if there is anything left to match. If the target string is zero length then we have a successful combination in @$used. Print the result and return.

      unless (length $target) { print join ("-", @$used), "\n"; return; }

      otherwise loop over the remaining unused words looking for matches with the start of the remaining target string.

      for my $part (@$parts) {

      if there is no match for the current word at the start of the target string skip to the next word

      next unless 0 == index $target, $part;

      otherwise isolate the part of the target string following the current word.

      my $remainder = substr $target, length $part;

      Recursively search the remaining part of the target string using the word list passed in excluding the current word and passing a list of the words used so far plus the current word.

      search ($remainder, [grep {$part ne $_} @$parts], [@$used, $pa +rt]); } }

      Perl is environmentally friendly - it saves trees

        salutations, GrandFather,

        the algorithm works well, but there is a small problem: if we try to segmentate:
        cowboycatdogcat
        it won't work, because it won't segmentate as, for example:
        cow-boy-cat-dog-cat
        because "cat" and "cat" are repeated morphemes, even if the array itself has the items "cowboy cow boy cat do dog cat" with "cat" repeated. how can we solve it? thank you in advance.
Re: combining array items to match a certain string
by pc2 (Beadle) on Mar 16, 2008 at 13:19 UTC
    oops, this reply was a repetition of the previous one (we posted it twice): http://www.perlmonks.org/?node_id=674350