revdiablo,
My idea was:
- For each word, keep track of what line it appears on
- For each line, iterate over pairs of words (notice I created my own combination iterator)
- For each each pair, get the intersection of lines
- If the intersection was more than 1 line, lookup the lines and print them
#!/usr/bin/perl
use strict;
use warnings;
my (%word, %seen);
chomp ( my @line = <DATA> );
for my $index ( 0 .. $#line ) {
$word{ $_ }{ $index } = undef for split /_/ , $line[ $index ];
}
for ( @line ) {
my $iter = by_two( $_ );
while ( my @comb = $iter->() ) {
my @matches = map { exists $word{ $comb[ 0 ] }{ $_ } ? $_ : ()
+ }
keys %{ $word{ $comb[ 1 ] } }
;
next if @matches < 2;
my $output = join ' and ' , map { $line[ $_ ] } sort { $a <=>
+$b } @matches;
next if $seen{ $output }++;
print "$output\n";
}
}
sub by_two {
my @list = split /_/ , shift;
return sub { () } if @list < 2;
my ($start, $stop, $pos, $done) = (0, $#list, 0, undef);
return sub {
return () if $done;
$pos++;
if ( $pos > $stop ) {
$start++;
$pos = $start + 1;
}
$done = 1 if $start == $stop - 1;
return $list[ $start ], $list[ $pos ];
}
}
__DATA__
one_two
one_three_two
three_one
one_four
four_three_one
You will notice I have 3 lines of output instead of 5. That is because instead of breaking 3 matches into pairs, I put all 3 on the same line. If you wanted to force the pair issue you could do so using the by_two iterator routine. Finally, it likely could be made more efficient - but hey, I am on a coding hiatus ATM.