in reply to Re^2: Removing the first record in a file containing fixed records
in thread Removing the first record in a file containing fixed records

I suggest you try it.

I have done so. We have some large fixed-record-length extracts lying around, record length about 4 KB. The size of file.txt was 1,985,184 KB when I started and 1,985,180 KB when the shift was done. It took three minutes.

It's true our fixed-length records are delivered to us with a newline at the end of each record. I always thought that was for the convenience of human readers with text editors, but it also enables Tie::File.

As the manpage says, Tie::File does not support fixed-width records unless of course they end in a record separator. Whether the records in the original question have newlines at the end, only sparkle can tell us.

Replies are listed 'Best First'.
Re^4: Removing the first record in a file containing fixed records
by BrowserUk (Patriarch) on Jul 18, 2008 at 22:43 UTC

    Contrast Tie::File taking 233 seconds:

    use strict; use warnings; use Tie::File; $/ = \96; my $start = time; tie my @records, 'Tie::File', $ARGV[ 0 ]; shift @records; untie @records; printf "Time: %f seconds\n", time() - $start; __END__ 2008-07-18 14:17 527,999,712 500MB.fixed 1 File(s) 527,999,712 bytes 0 Dir(s) 2,320,445,440 bytes free C:\test>junk7 500MB.fixed Time: 233.000000 seconds C:\test>dir 500MB.fixed Volume in drive C has no label. Volume Serial Number is BCCA-B4CC Directory of C:\test 2008-07-18 23:34 527,999,616 500MB.fixed 1 File(s) 527,999,616 bytes 0 Dir(s) 2,320,416,768 bytes free

    With a read-seek-write solution taking < 5 seconds:

    #! perl -slw use strict; use Fcntl qw[ SEEK_CUR SEEK_SET ]; use constant BUFSIZE => 64 * 1024; my $start = time; our $RECLEN || die "you must specify the length of the header. -RECLEN +=nnn"; @ARGV or die "No filename"; open FILE, '+<:raw', $ARGV[ 0 ] or die "$!: $ARGV[ 0 ]"; sysread FILE, my $header, $RECLEN or die "sysread: $!"; my( $nextWrite, $nextRead ) = 0; while( sysread FILE, my $buffer, BUFSIZE ) { $nextRead = sysseek FILE, 0, SEEK_CUR or die "Seek query next read failed; $!"; sysseek FILE, $nextWrite, SEEK_SET or die "Seek next write failed: $!"; syswrite FILE, $buffer or die "Write failed: $!";; $nextWrite = sysseek FILE, 0, SEEK_CUR or die "Seek query next write failed $!"; sysseek FILE, $nextRead, SEEK_SET or die "Seek next Read failed: $!"; } truncate FILE, $nextWrite or die "truncate failed: $!"; close FILE or die "close failed: $!"; printf "Took: %f seconds\n", time() - $start; __END__ C:\test>dir 500MB.fixed Volume in drive C has no label. Volume Serial Number is BCCA-B4CC Directory of C:\test 2008-07-18 23:34 527,999,616 500MB.fixed 1 File(s) 527,999,616 bytes 0 Dir(s) 2,320,416,768 bytes free C:\test>698472 -RECLEN=96 500MB.fixed Took: 5.000000 seconds C:\test>dir 500MB.fixed Volume in drive C has no label. Volume Serial Number is BCCA-B4CC Directory of C:\test 2008-07-18 23:37 527,999,520 500MB.fixed 1 File(s) 527,999,520 bytes 0 Dir(s) 2,320,445,440 bytes free

    I'll grant you, it does have the virtue of simplicity.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thank you for sharing these timings and for writing the faster code. Both are instructive.

      Would the operation run even faster if coded in C?

        Would the operation run even faster if coded in C?

        Marginally. Maybe. A straight forward conversion of the above code to C ran in 4 seconds one time and 2 the next--probably because the file was still in the system cache. But the timing was only to the nearest second.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.