Appending multiple files into one or more files

iamravikanth has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Appending multiple files into one or more files by Ratazong (Monsignor) on Jul 19, 2010 at 12:30 UTC
You may use some simple shell-commands: `cat 1.txt >> AppendFile_1.txt cat 2.txt >> AppendFile_2.txt` [download] Of course you can write a Perl-script for generating and executing the `cat` commands... HTH, Rata	[reply] [d/l] [select]
Re: Appending multiple files into one or more files by roboticus (Chancellor) on Jul 19, 2010 at 13:41 UTC
lamravikanth: Hmmm ... you're complaining that it gets very slow. But a straightforward implementation shouldn't take much time at all except for the actual creation of the appended files. Since you're not presenting any code, I can't tell if you've got an error in your program that causes the low performance or not. Try commenting out the part of your program that actually creates the appended data files and measure how long it takes to run with 10000 file numbers. If it runs quickly, then the performance issue is due to the amount of data you're reading and writing. In that case, you may want to use tricks like putting your output files on a different disk drive than your input files to reduce your I/O time. If, on the other hand, it takes a long time to run, then I'd expect either: Your algorithm to split your list of filenames into groups to append together has a problem (like nested for loops or some other O(n) issue), or You're using a filesystem that doesn't handle many files in a single directory, and the act of scanning through the directory is taking a long time. You can differentiate between these two cases with a script that simply reads all the filenames from the directory and does nothing with them. If that script runs quickly, then you have a problem with your algorithm that splits up your filenames. If it runs very slowly, then you may want to change the filesystem you're using or perhaps partition files up into different subdirectories. I did a quick test: I generated roughly 50,000 files and then grouped them by their numeric prefix, then deleted all of the files, like so: roboticus@Boink:~/funkytest$ ls genfiles.pl groupfiles.pl roboticus@Boink:~/funkytest$ time ./genfiles.pl real 0m4.937s user 0m1.644s sys 0m3.292s roboticus@Boink:~/funkytest$ ls \| wc -l 49993 roboticus@Boink:~/funkytest$ time ./groupfiles.pl 6712: 6712_WRW, 6712_DIK, 6712_FRB 8563: 8563_FHL, 8563_AAE, 8563_TSL, 8563_LCU 5006: 5006_SLA, 5006_PZK, 5006_PUB, 5006_FCK 8434: 8434_HNX, 8434_HPB, 8434_YED, 8434_SCB, 8434_KBS, 8434_CEH, 8434 +_JCH, 8434_NVN, 8434_VPN, 8434_GFM, 8434_BNJ 3509: 3509_EAY, 3509_WNU, 3509_MUI, 3509_NPX, 3509_LHX 7652: 7652_IMC, 7652_GMN 4863: 4863_MTN, 4863_RGD, 4863_BFT, 4863_LSF, 4863_KNJ, 4863_JGE Files: 49991, Groups: 9922 real 0m0.736s user 0m0.664s sys 0m0.080s roboticus@Boink:~/funkytest$ time rm {0,1,2,3,4,5,6,7,8,9}* real 0m6.451s user 0m3.044s sys 0m3.212s roboticus@Boink:~/funkytest$ [download] As you can see, it takes little time to split the files into groups (on my machine, anyway). If the files had data in them and I did the concatenation you mention, then the runtime would be totally dominated by the act of making the concatenated files. ...roboticus	[reply] [d/l]
Re^2: Appending multiple files into one or more files by roboticus (Chancellor) on Jul 19, 2010 at 13:46 UTC
The code I used (if anyone cares) is: genfiles.pl `#!/usr/bin/perl -w use strict; use warnings; for (1 ... 50000) { my $t = int rand 10000; my $u = join('',('A'..'Z')[rand 26, rand 26, rand 26]); open my $OUF, '>', $t.'_'.$u or die; close $OUF; }` [download] groupfiles.pl `#!/usr/bin/perl -w use strict; use warnings; my %filegroups; opendir(my $DIRH, '.') \|\| die "Can't open dir: $!\n"; my $cnt=0; while (my $filename = readdir($DIRH)) { next unless $filename =~ /^(\d+)/; push @{$filegroups{$1}}, $filename; ++$cnt; } my $cnt2=0; for my $grp (keys %filegroups) { print "$grp: ", join(", ", @{$filegroups{$grp}}), "\n"; last if $cnt2++>5; } print "Files: $cnt, Groups: ", scalar(keys %filegroups), "\n";` [download] ...roboticus	[reply] [d/l] [select]
Re^3: Appending multiple files into one or more files by iamravikanth (Novice) on Jul 20, 2010 at 11:36 UTC
Hi Roboticus, Thanks for your suggestions, it really helped. Regards, Ravi.	[reply]
Re: Appending multiple files into one or more files by Corion (Patriarch) on Jul 19, 2010 at 12:25 UTC
I guess you will have to show us what code you've written. My code would mostly use opendir, readdir.	[reply]
Re^2: Appending multiple files into one or more files by Anonymous Monk on Jul 19, 2010 at 12:28 UTC
I would glob or findrule	[reply]
Re: Appending multiple files into one or more files by Marshall (Canon) on Jul 19, 2010 at 13:27 UTC
First, get a list of all the files. Below this is the __DATA__ segment, but for your app, use opendir, readdir. Make a hash of array keyed upon the "number" of the file (the digits before the '.' or '_'). Then for each new numbered "append" file, open a file handle and copy the files to the new file. #!/usr/bin/perl -w use strict; my %new_files; while (my $filename = <DATA>) { chomp $filename; my $num = ($filename =~ /^(\d+)/)[0]; #just the first digits push (@{$new_files{$num}},$filename); } foreach my $new_file (keys %new_files) { print "open file for ....$new_file"."_AppendFile.txt\n"; #your code print "put these files in there: \n"; #you have to make a copy loop print "@{$new_files{$new_file}}", "\n"; print "\n"; } =prints: open file for ....1_AppendFile.txt put these files in there: 1_ABC.txt 1_XYZ.txt open file for ....3_AppendFile.txt put these files in there: 3_ABC.txt open file for ....2_AppendFile.txt put these files in there: 2_ABC.txt 2.XYZ.txt open file for ....5_AppendFile.txt put these files in there: 5.XYZ.txt =cut __DATA__ 1_ABC.txt 2_ABC.txt 3_ABC.txt 1_XYZ.txt 2.XYZ.txt 5.XYZ.txt [download]	[reply] [d/l]
Re: Appending multiple files into one or more files by cdarke (Prior) on Jul 19, 2010 at 16:28 UTC
An alternative method is to use the diamond operator (the ARGV filehandle). Set the input filenames in @ARGV, then the opens are taken care of. For example: `open (my $fh, '>', '1_AppendFile.txt') or die "etc: $!"; @ARGV = glob('*.txt'); while (<>) { print $fh } close ($fh);` [download]	[reply] [d/l]
Re: Appending multiple files into one or more files by Sinistral (Monsignor) on Jul 19, 2010 at 13:34 UTC
And, if you want, you can always use IO::Cat or File::Cat along with glob to separate the 1_* from 2_* files	[reply]