Thanks for your reply. That was illuminating!

This is what I wound up with (passes all tests):

use Test::More; my @test_data = ( [ 'set 1', 'SALMWN DE EGENNHSEN TON BOOZ EK THS RAXAB BOOZ DE EGENNHSEN TON WBHD +EK THS ROUQ WBHD DE EGENNHSEN TON IESSAI', 'SALMWN DE EGENNHSEN TON BOES EK THS RAXAB BOES DE EGENNHSEN TON IWBHD + EK THS ROUQ IWBHD DE EGENNHSEN TON IESSAI', [ 'SALMWN DE EGENNHSEN TON ', 'DE EGENNHSEN TON IESSAI ', 'EK THS RAXAB ', 'DE EGENNHSEN TON ', 'EK THS ROUQ ' ] ], [ 'set 2', 'IOUDAS DE EGENNHSEN TON FARES KAI TON ZARA EK THS QAMAR FARES DE EGEN +NHSEN TON ESRWM ESRWM DE EGENNHSEN TON ARAM', 'IOUDAS DE EGENNHSEN TON FARES KAI TON ZARA EK THS QAMAR FARES DE EGEN +NHSEN TON ESRWM ESRWM DE EGENNHSEN TON ARAM', [ 'IOUDAS DE EGENNHSEN TON FARES KAI TON ZARA EK THS QAMAR FARES DE EGEN +NHSEN TON ESRWM ESRWM DE EGENNHSEN TON ARAM ' ] ], [ 'set 3', 'PASAI OUN AI GENEAI APO ABRAAM EWS DABID GENEAI DEKATESSARES KAI APO +DABID EWS THS METOIKESIAS BABULWNOS GENEAI DEKATESSARES KAI APO THS M +ETOIKESIAS BABULWNOS EWS TOU XRISTOU GENEAI DEKATESSARES', 'PASAI OUN AI GENEAI APO ABRAAM EWS DAUID GENEAI DEKATESSARES KAI APO +DAUID EWS THS METOIKESIAS BABULWNOS GENEAI DEKATESSARES KAI APO THS M +ETOIKESIAS BABULWNOS EWS TOU XRISTOU GENEAI DEKATESSARES', [ 'EWS THS METOIKESIAS BABULWNOS GENEAI DEKATESSARES KAI APO THS METOIKE +SIAS BABULWNOS EWS TOU XRISTOU GENEAI DEKATESSARES ', 'PASAI OUN AI GENEAI APO ABRAAM EWS ', 'GENEAI DEKATESSARES KAI APO ' ] ], ); plan 'tests' => scalar @test_data; foreach my $test (@test_data) { my $name = $test->[0]; my @input = @{$test}[ 1, 2 ]; my $wanted = $test->[3]; my @result = all_new(@input); is_deeply( \@result, $wanted, $name ); } sub all_new { my ( $str1, $str2 ) = @_; my @s1 = split( /\s+/, $str1 ); my @s2 = split( /\s+/, $str2 ); my @matrix = (); my %substrings = (); my $id = 0; for ( my $i = 0 ; $i <= $#s2 ; $i++ ) { for ( my $j = 0 ; $j <= $#s1 ; $j++ ) { if ( "$s1[$j]" eq "$s2[$i]" ) { if ( $i == 0 || $j == 0 ) { $matrix[$i][$j] = 1; } else { $matrix[$i][$j] = $matrix[ $i - 1 ][ $j - 1 ] + 1; if ( $i == $#s2 || $j == $#s1 ) { $substrings{$id}[0] = $j - $matrix[$i][$j] + 1 +; $substrings{$id}[1] = $j; $substrings{$id}[2] = $i - $matrix[$i][$j] + 1 +; $substrings{$id}[3] = $i; $id++; } } } else { $matrix[$i][$j] = 0; if ( $i != 0 && $j != 0 && $matrix[ $i - 1 ][ $j - 1 ] + != 0 ) { $substrings{$id}[0] = $j - $matrix[ $i - 1 ][ $j - + 1 ]; $substrings{$id}[1] = $j - 1; $substrings{$id}[2] = $i - $matrix[ $i - 1 ][ $j - + 1 ]; $substrings{$id}[3] = $i - 1; $id++; } } } } my @substr_mat = (); my %map1 = (); my %map2 = (); foreach my $str ( sort { ( $substrings{$b}[1] - $substrings{$b}[0] ) <=> ( $substrings{$a}[1] - $substrings{$a}[0] ) || $substrings{$a}[0] <=> $substrings{$b}[0] } keys %substrings ) { my $substr_tmp1 = ''; my $substr_tmp2 = ''; foreach my $i ( $substrings{$str}[0] .. $substrings{$str}[1] ) + { if ( !$map1{$i}++ ) { $substr_tmp1 .= "$s1[$i] "; } } next if !$substr_tmp1; foreach my $i ( $substrings{$str}[2] .. $substrings{$str}[3] ) + { if ( !$map2{$i}++ ) { $substr_tmp2 .= "$s2[$i] "; } } next if !$substr_tmp2; push @substr_mat, ( length $substr_tmp1 <= length $substr_tmp2 + ) ? { str => $substr_tmp1, wc => ( $substrings{$str}[1] - $substrings{$str}[0] ), site => $substrings{$str}[0] } : { str => $substr_tmp2, wc => ( $substrings{$str}[3] - $substrings{$str}[2] ), site => $substrings{$str}[0] }; } return map { $_->{str} } sort { $b->{wc} <=> $a->{wc} || $a->{site} <=> $b->{site} } @sub +str_mat; }

This hasn't changed very much. In @substr_mat, instead of strings, I put hash refs. Each hash ref has in it the string, the word count, and the site where the string was found. The site is measured in words, so if you have "BLAHBLAH FOO" and "BLAH BAR", "FOO" and "BAR" are considered to be at the same "site".

That data structure looks like this:

$VAR1 = [ { 'site' => 0, 'str' => 'SALMWN DE EGENNHSEN TON ', 'wc' => 3 }, { 'site' => 17, 'str' => 'DE EGENNHSEN TON IESSAI ', 'wc' => 3 }, { 'site' => 5, 'str' => 'EK THS RAXAB ', 'wc' => 2 }, { 'site' => 9, 'str' => 'DE EGENNHSEN TON ', 'wc' => 2 }, { 'site' => 13, 'str' => 'EK THS ROUQ ', 'wc' => 2 } ];

(I just now noticed my word count is off by one. This isn't a problem for us because it will still sort correctly.)

So, before returning, I sort by the word count, then the site, and finally pass it through a map to turn it into simple strings.

I get the impression that the sort at the top of the foreach is supposed to do this work, but I think it's getting confused by the stuff going on in the body of the loop.


In reply to Re^3: **reopened**Re: weird subroutine behavior by kyle
in thread weird subroutine behavior by flaviusm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.