lnin has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks:
Merging the huge text files ( each text files having more than 10 MB of data), is taking signficant time, few times to merge 4 text files, it is taking more than 8 minutes.

Code which i have written is as follows. Please advise, if there is any better way of writing, that takes less time

my $MergedFileName="MergeOutput.txt"; # Read the logs file into an array, with .txt extention sorted with ti +mestamp existing in the filename opendir(DIR, '.') or die "Input file not avialable Error #: .$!"; my @filesRead = sort(grep(/\.txt$/,readdir(DIR))); closedir(DIR); # Open the Output file open(MAINOUTPUT,">MergedFileName") || warn "Can't open file\n"; # Start merging file into main file by reading each file, line by line FILE: foreach (@filesRead) { open(FILE, $_) || ((warn "Can't open file $_\n"),next FILE); while (<FILE>) { print MAINOUTPUT $_; } close(FILE); } close(MAINOUTPUT); # Merging of the files is done

Replies are listed 'Best First'.
Re: How to merge Huge log files (each 10 MB) into a single file
by BrowserUk (Patriarch) on Sep 03, 2009 at 14:26 UTC
    to merge 4 text files, it is taking more +than 8 minutes.

    There's something you're not telling us? Or your disk is badly screwed. This is merging 4 10MB files on my system:

    [15:23:09.44] c:\test>dir 10* mer* 03/09/2009 15:06 10,516,481 10MB.1 03/09/2009 15:06 10,516,481 10MB.2 03/09/2009 15:06 10,516,481 10MB.3 03/09/2009 15:07 10,516,481 10MB.4 4 File(s) 42,065,924 bytes [15:23:16.89] c:\test>perl -pE1 10MB.1 10MB.2 10MB.3 10MB.4 > mergedOu +tput.txt [15:23:19.21] c:\test>dir mergedOutput.txt 03/09/2009 15:23 42,065,924 mergedOutput.txt

    Even with a cold cache, it never seems to take more than 3 seconds.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Copy command usage and perl -pE1 usage really solved the problem. Thanks a lot

Re: How to merge Huge log files (each 10 MB) into a single file
by Anonymous Monk on Sep 03, 2009 at 14:07 UTC
    Try timing cat FILE1 FILE2 FILE3 FILE4 >bigone
Re: How to merge Huge log files (each 10 MB) into a single file
by lostjimmy (Chaplain) on Sep 03, 2009 at 14:12 UTC

    Anonymous Monk's advice is probably the best solution, but if you're looking to speed up a perl solution, it might be faster to just slurp each file instead of reading line by line. You can do this by setting $/ to undef, or by using File::Slurp.

    That being said, wouldn't it make more sense to actually merge the files based on the timestamps of each log entry?

      I am working on windows platform, not able to try the cat command

        cat works fine in Windows too provided you install it, say, as part of cygwin. But you can also do

        copy /b FILE1+FILE2+FILE3+FILE4 bigone
        Under windows, you would use type instead of cat: type file1 file2 file*.* > output.txt
Re: How to merge Huge log files (each 10 MB) into a single file
by SuicideJunkie (Vicar) on Sep 03, 2009 at 14:09 UTC

    Are you planning to do any processing during the merge?
    A mergesort or filtering possibly?

    If not, why not just cat them together with the standard system commands instead of reinventing a very common wheel?