Are the files in the same order? ....seems like not, and that's somewhat relevant to the solution. A set of ordered lists where each line is a direct match for the same line number in the other file would lead to an O(n) solution. If the files don't have a "line number for line number" relationship, then the solution becomes computationally more complex. A hash could be used, but the computational complexity does increase.

By the way; this problem does seem like something that may have been approached in some form previously. And in fact there is a module Lingua::EN::Conjugate. But it addresses the general need for conjugation, not the specific task of matching up verb tenses and persons.

Here's one solution that could work if the word-sets can fit into memory all at once. With the magic of references and the uniqueness as well as associativity of hash keys, the solution just sort of falls into place with a single hash.

use strict; use warnings; use v5.14; use autodie; my %word_sets; while( <> ) { chomp; my ( $conjugation, $infinitive ) = split /\s*\|\s*/; push @{ $word_sets{ $infinitive }, $conjugation; } open my $outfh, '>', 'combined_conjugations.txt'; while( my( $infinitive, $conjugations ) = each %word_sets ) { next unless @{$conjugations} > 1; # Skip items that didn't appear +twice. say $outfh join( ' | ', $infinitive, @{ $conjugations } ); } close $outfh;

By your problem description it looks like we can assume that in each file the alternate conjugation comes before the infinitive, whereas your solution file would list the infinitive first on each line, followed by the alternate conjugations. I took care to preserve what seemed to be your intent in this respect. What I didn't preserve, however, is any notion of line ordering. You probably do want some form of sorting, but it wasn't apparent in your question. If you do need the list to be sorted, or to preserve some original order, you'll have to modify the solution provided.


Dave


In reply to Re: script to merge files by davido
in thread script to merge files by marylein

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.