How to merge Huge log files (each 10 MB) into a single file

lnin has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks:
Merging the huge text files ( each text files having more than 10 MB of data), is taking signficant time, few times to merge 4 text files, it is taking more than 8 minutes.

Code which i have written is as follows. Please advise, if there is any better way of writing, that takes less time

my $MergedFileName="MergeOutput.txt"; 

# Read the logs file into an array, with .txt extention sorted with ti
+mestamp existing in the filename
opendir(DIR, '.') or die "Input file not avialable Error #: .$!";
my @filesRead = sort(grep(/\.txt$/,readdir(DIR)));
closedir(DIR);

# Open the Output file
open(MAINOUTPUT,">MergedFileName") || warn "Can't open file\n";

# Start merging file into main file by reading each file, line by line
   FILE: foreach (@filesRead) {
      open(FILE, $_) || ((warn "Can't open file $_\n"),next FILE); 
            while (<FILE>) { 
          print MAINOUTPUT $_; 
      } 
      close(FILE); 
   } 
   
 close(MAINOUTPUT);
# Merging of the files is done
[download]

Comment on How to merge Huge log files (each 10 MB) into a single file Download Code

Replies are listed 'Best First'.
Re: How to merge Huge log files (each 10 MB) into a single file by BrowserUk (Patriarch) on Sep 03, 2009 at 14:26 UTC
to merge 4 text files, it is taking more +than 8 minutes. There's something you're not telling us? Or your disk is badly screwed. This is merging 4 10MB files on my system: `[15:23:09.44] c:\test>dir 10* mer* 03/09/2009 15:06 10,516,481 10MB.1 03/09/2009 15:06 10,516,481 10MB.2 03/09/2009 15:06 10,516,481 10MB.3 03/09/2009 15:07 10,516,481 10MB.4 4 File(s) 42,065,924 bytes [15:23:16.89] c:\test>perl -pE1 10MB.1 10MB.2 10MB.3 10MB.4 > mergedOu +tput.txt [15:23:19.21] c:\test>dir mergedOutput.txt 03/09/2009 15:23 42,065,924 mergedOutput.txt` [download] Even with a cold cache, it never seems to take more than 3 seconds. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP PCW It is as I've been saying!(Audio until 20090817)	[reply] [d/l]
Re^2: How to merge Huge log files (each 10 MB) into a single file by lnin (Initiate) on Sep 04, 2009 at 05:29 UTC
Copy command usage and perl -pE1 usage really solved the problem. Thanks a lot	[reply]
Re: How to merge Huge log files (each 10 MB) into a single file by Anonymous Monk on Sep 03, 2009 at 14:07 UTC
Try timing `cat FILE1 FILE2 FILE3 FILE4 >bigone`	[reply] [d/l]
Re: How to merge Huge log files (each 10 MB) into a single file by lostjimmy (Chaplain) on Sep 03, 2009 at 14:12 UTC
Anonymous Monk's advice is probably the best solution, but if you're looking to speed up a perl solution, it might be faster to just slurp each file instead of reading line by line. You can do this by setting `$/` to `undef`, or by using File::Slurp. That being said, wouldn't it make more sense to actually merge the files based on the timestamps of each log entry?	[reply] [d/l] [select]
Re^2: How to merge Huge log files (each 10 MB) into a single file by lnin (Initiate) on Sep 03, 2009 at 14:23 UTC
I am working on windows platform, not able to try the cat command	[reply]
Re^3: How to merge Huge log files (each 10 MB) into a single file by ikegami (Patriarch) on Sep 03, 2009 at 14:26 UTC
`cat` works fine in Windows too provided you install it, say, as part of cygwin. But you can also do `copy /b FILE1+FILE2+FILE3+FILE4 bigone` [download]	[reply] [d/l] [select]
Re^3: How to merge Huge log files (each 10 MB) into a single file by SuicideJunkie (Vicar) on Sep 03, 2009 at 14:48 UTC
Under windows, you would use type instead of cat: `type file1 file2 file. > output.txt`	[reply] [d/l]
Re: How to merge Huge log files (each 10 MB) into a single file by SuicideJunkie (Vicar) on Sep 03, 2009 at 14:09 UTC
Are you planning to do any processing during the merge? A mergesort or filtering possibly? If not, why not just cat them together with the standard system commands instead of reinventing a very common wheel?	[reply]