in reply to Re: Removing the first record in a file containing fixed records
in thread Removing the first record in a file containing fixed records

I suggest you try it.

First you should try it on a small file with fixed length records and no delimiters.

When you've worked out why the file ends up empty, and how to fix that, then try it on a 500MB fixed record length file with no delimiters. And if you could time how long it takes and report back that would be interesting.

Don't worry about being too accurate, the nearest week should be fine.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."
  • Comment on Re^2: Removing the first record in a file containing fixed records

Replies are listed 'Best First'.
Re^3: Removing the first record in a file containing fixed records
by Narveson (Chaplain) on Jul 18, 2008 at 21:28 UTC
    I suggest you try it.

    I have done so. We have some large fixed-record-length extracts lying around, record length about 4 KB. The size of file.txt was 1,985,184 KB when I started and 1,985,180 KB when the shift was done. It took three minutes.

    It's true our fixed-length records are delivered to us with a newline at the end of each record. I always thought that was for the convenience of human readers with text editors, but it also enables Tie::File.

    As the manpage says, Tie::File does not support fixed-width records unless of course they end in a record separator. Whether the records in the original question have newlines at the end, only sparkle can tell us.

      Contrast Tie::File taking 233 seconds:

      use strict; use warnings; use Tie::File; $/ = \96; my $start = time; tie my @records, 'Tie::File', $ARGV[ 0 ]; shift @records; untie @records; printf "Time: %f seconds\n", time() - $start; __END__ 2008-07-18 14:17 527,999,712 500MB.fixed 1 File(s) 527,999,712 bytes 0 Dir(s) 2,320,445,440 bytes free C:\test>junk7 500MB.fixed Time: 233.000000 seconds C:\test>dir 500MB.fixed Volume in drive C has no label. Volume Serial Number is BCCA-B4CC Directory of C:\test 2008-07-18 23:34 527,999,616 500MB.fixed 1 File(s) 527,999,616 bytes 0 Dir(s) 2,320,416,768 bytes free

      With a read-seek-write solution taking < 5 seconds:

      #! perl -slw use strict; use Fcntl qw[ SEEK_CUR SEEK_SET ]; use constant BUFSIZE => 64 * 1024; my $start = time; our $RECLEN || die "you must specify the length of the header. -RECLEN +=nnn"; @ARGV or die "No filename"; open FILE, '+<:raw', $ARGV[ 0 ] or die "$!: $ARGV[ 0 ]"; sysread FILE, my $header, $RECLEN or die "sysread: $!"; my( $nextWrite, $nextRead ) = 0; while( sysread FILE, my $buffer, BUFSIZE ) { $nextRead = sysseek FILE, 0, SEEK_CUR or die "Seek query next read failed; $!"; sysseek FILE, $nextWrite, SEEK_SET or die "Seek next write failed: $!"; syswrite FILE, $buffer or die "Write failed: $!";; $nextWrite = sysseek FILE, 0, SEEK_CUR or die "Seek query next write failed $!"; sysseek FILE, $nextRead, SEEK_SET or die "Seek next Read failed: $!"; } truncate FILE, $nextWrite or die "truncate failed: $!"; close FILE or die "close failed: $!"; printf "Took: %f seconds\n", time() - $start; __END__ C:\test>dir 500MB.fixed Volume in drive C has no label. Volume Serial Number is BCCA-B4CC Directory of C:\test 2008-07-18 23:34 527,999,616 500MB.fixed 1 File(s) 527,999,616 bytes 0 Dir(s) 2,320,416,768 bytes free C:\test>698472 -RECLEN=96 500MB.fixed Took: 5.000000 seconds C:\test>dir 500MB.fixed Volume in drive C has no label. Volume Serial Number is BCCA-B4CC Directory of C:\test 2008-07-18 23:37 527,999,520 500MB.fixed 1 File(s) 527,999,520 bytes 0 Dir(s) 2,320,445,440 bytes free

      I'll grant you, it does have the virtue of simplicity.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Thank you for sharing these timings and for writing the faster code. Both are instructive.

        Would the operation run even faster if coded in C?