in reply to Re: Reading a VERY LARGE file with SINGLE line as content!
in thread Reading a VERY LARGE file with SINGLE line as content!

Here is a snippet from some code that I wrote "many moons ago". This code will run not just a "little bit" faster than a Win command line...the performance difference is HUGE, even with just 8K buffer. If the search in the buffer can be run with say just two of these 8K buffers, it will be very, very fast. This is a copy routine, but same principle works for reading large files. showfailed() is a tricky thing that is sort of like die() and warn() and has a GUI display context. For the purpose here, it doesn't even matter.
################## # Binary File Copy and Append # # bcopy($output, @input_files); # # first element is the output file path, # # then input files: file1, file2...file n. sub bcopy() { (my $out, my @in_list)=@_; open (OUTBIN, ">", "$out") || showfailed ("unable to open $out"); binmode(OUTBIN) || showfailed ("unable to set binmode $out"); foreach my $infile (@in_list) { open(INBIN, "<", "$infile")|| showfailed ("unable to open $infile"); binmode(INBIN) || showfailed ("unable to set binmode $infile"); while (read(INBIN, my $buff, 8 * 2**10)) { print OUTBIN $buff; } close(INBIN) || showfailed("unable to close $infile"); print "$infile appended to $out\n"; } close(OUTBIN) || showfailed("unable to close $out"); } #end of bcopy

Replies are listed 'Best First'.
Re^3: Reading a VERY LARGE file with SINGLE line as content!
by BrowserUk (Patriarch) on Jul 18, 2009 at 09:25 UTC
    This code will run not just a "little bit" faster than a Win command line...the performance difference is HUGE, even with just 8K buffer.

    That's some strange code and a big claim. I thought I test the claim and my first attempt to call bcopy copy got:

    Too many arguments for main::bcopy at C:\test\junk8.pl line 34, near " +@in )"

    Once I removed the useless prototype:

    #! perl -sw use 5.010; use strict; sub bcopy { (my $out, my @in_list)=@_; open (OUTBIN, ">", "$out") || showfailed ("unable to open $out"); binmode(OUTBIN) || showfailed ("unable to set binmode $out"); foreach my $infile (@in_list) { open(INBIN, "<", "$infile")|| showfailed ("unable to open $infile"); binmode(INBIN) || showfailed ("unable to set binmode $infile"); while (read(INBIN, my $buff, 8 * 2**10)) { print OUTBIN $buff; } close(INBIN) || showfailed("unable to close $infile"); print "$infile appended to $out\n"; } close(OUTBIN) || showfailed("unable to close $out"); } my @in = glob shift; my $out = shift; say time; bcopy( $out, @in ); say time;

    and ran it on 10x 128meg files:

    [10:11:26.67} C:\test>junk8 *.jnk bigjnk.out 1247908624 bugjunk1.jnk appended to bigjnk.out bugjunk10.jnk appended to bigjnk.out bugjunk2.jnk appended to bigjnk.out bugjunk3.jnk appended to bigjnk.out bugjunk4.jnk appended to bigjnk.out bugjunk5.jnk appended to bigjnk.out bugjunk6.jnk appended to bigjnk.out bugjunk7.jnk appended to bigjnk.out bugjunk8.jnk appended to bigjnk.out bugjunk9.jnk appended to bigjnk.out 1247908656

    32 seconds. Then again with xcopy:

    [10:22:58.79} C:\test>xcopy /Y *.jnk bigjnk.out Does bigjnk.out specify a file name or directory name on the target (F = file, D = directory)? f C:bugjunk1.jnk C:bugjunk10.jnk C:bugjunk2.jnk C:bugjunk3.jnk C:bugjunk4.jnk C:bugjunk5.jnk C:bugjunk6.jnk C:bugjunk7.jnk C:bugjunk8.jnk C:bugjunk9.jnk 10 File(s) copied [10:23:06.51} C:\test>

    Even with time it took me to respond to the dumb prompt, took just 8 seconds.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Yep, typo with (), there was a proto before and I didn't remove the paren's. The code isn't "strange", looks pretty straight-forward to me.

      Now there is a "trick". This code works fastest for a large number of relatively small files. A very simple Perl program binmode or not will also beat Windows XCopy. There is a huge performance hit encountered by writing to the screen. I was stunned when I first saw this. If Xcopy generates a lot of screen output, stdout is what slows this thing down!

      Now it could be that WinXP sp3 has improved this...I dunno.