in reply to script to merge files
Are the files in the same order? ....seems like not, and that's somewhat relevant to the solution. A set of ordered lists where each line is a direct match for the same line number in the other file would lead to an O(n) solution. If the files don't have a "line number for line number" relationship, then the solution becomes computationally more complex. A hash could be used, but the computational complexity does increase.
By the way; this problem does seem like something that may have been approached in some form previously. And in fact there is a module Lingua::EN::Conjugate. But it addresses the general need for conjugation, not the specific task of matching up verb tenses and persons.
Here's one solution that could work if the word-sets can fit into memory all at once. With the magic of references and the uniqueness as well as associativity of hash keys, the solution just sort of falls into place with a single hash.
use strict; use warnings; use v5.14; use autodie; my %word_sets; while( <> ) { chomp; my ( $conjugation, $infinitive ) = split /\s*\|\s*/; push @{ $word_sets{ $infinitive }, $conjugation; } open my $outfh, '>', 'combined_conjugations.txt'; while( my( $infinitive, $conjugations ) = each %word_sets ) { next unless @{$conjugations} > 1; # Skip items that didn't appear +twice. say $outfh join( ' | ', $infinitive, @{ $conjugations } ); } close $outfh;
By your problem description it looks like we can assume that in each file the alternate conjugation comes before the infinitive, whereas your solution file would list the infinitive first on each line, followed by the alternate conjugations. I took care to preserve what seemed to be your intent in this respect. What I didn't preserve, however, is any notion of line ordering. You probably do want some form of sorting, but it wasn't apparent in your question. If you do need the list to be sorted, or to preserve some original order, you'll have to modify the solution provided.
Dave
|
|---|