Based on the wording of your question, I assume you already know how to open files, and read and write to them. It sounds like determining which files to merge is the holdup, so that's what I focused on.

I hacked this together based on your example data. If you don't want to use a regex a combination of File::Basename, File::Spec, and split would accomplish the same thing.

This approach simply pulls the filenames apart and uses a hash to keep track of the path, 'group by' field, and 'part' number. After all of the filenames have been read, files to be merged are identified by looking for names that contain more than one 'part' in the array. It's quick and dirty, but gets the job done.

use strict; use warnings; use Data::Dumper; my @paths = qw( This/is/the/full/path/file.abc.part-1.txt This/is/the/full/path/file.abc.part-2.txt This/is/the/full/path/file.abc.part-3.txt This/is/the/full/path/file.def.part-1.txt This/is/the/full/path/file.def.part-2.txt This/is/the/full/path/file.ghi.part-1.txt This/is/the/full/path/file.jkl.part-2.txt This/is/the/full/path/file.mno.part-5.txt ); my %combo; foreach my $pathfile ( @paths ) { if( $pathfile =~ m/^(.+file\.)(\w+?)\.(part-\d+)\.txt$/ ) { push( @{ $combo{$1}{$2} }, $3 ); } else { warn "$pathfile does not match expected format"; } } foreach my $path ( keys %combo ) { foreach my $type ( keys %{ $combo{$path} } ) { if( scalar @{ $combo{$path}{$type} } > 1 ) { my $newfile = join( '', $path, $type, '.MERGED.txt' ); print "These files go into $newfile:\n"; print ' ', join( ', ', @{ $combo{$path}{$type} } ), "\n" +; } } } print Dumper( \%combo );

Using the example data from the OP, this outputs:

These files go into This/is/the/full/path/file.abc.MERGED.txt: part-1, part-2, part-3 These files go into This/is/the/full/path/file.def.MERGED.txt: part-1, part-2 $VAR1 = { 'This/is/the/full/path/file.' => { 'jkl' => [ 'part-2' ], 'abc' => [ 'part-1', 'part-2', 'part-3' ], 'mno' => [ 'part-5' ], 'def' => [ 'part-1', 'part-2' ], 'ghi' => [ 'part-1' ] } };


In reply to Re: compare and merge files by bobf
in thread compare and merge files by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.