Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Muy Large File

by BrowserUk (Patriarch)
on Mar 14, 2005 at 09:01 UTC ( [id://439199]=note: print w/replies, xml ) Need Help??


in reply to Muy Large File

... this is taking over 4 hours....

You're doing something wrong :).

The following shows Perl processing a 32 GB file in-place, finding and replacing 30% of it's contents in under 25 minutes; on a single cpu 512 MB ram machine. (the process only uses 3 MB of ram).

#! perl -slw use strict; our $BUFSIZE ||= 2**20; open my $fh, '+< :raw', $ARGV[ 0 ] or die $!; while( sysread $fh, $_, $BUFSIZE ) { tr[123][123]; sysseek $fh, -length(), 1; ## Updated per Dave_the_m's correction +below++ syswrite $fh, $_; } close $fh; __DATA__ [ 8:31:52.64] P:\test>439181 data\integers.dat [ 8:54:43.92] P:\test>dir data\integers.dat Volume in drive P has no label. Volume Serial Number is BCCA-B4CC Directory of P:\test\data 14/03/2005 08:54 34,359,738,368 integers.dat 1 File(s) 34,359,738,368 bytes

Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.

Replies are listed 'Best First'.
Re^2: Muy Large File
by dave_the_m (Monsignor) on Mar 14, 2005 at 10:53 UTC
    One slight nit: if the file size isn't a multiple of the buffer size, the final seek will seek back too far and corrupt the final block.
    while(sysread $fh, $_, $BUFSIZE ) { tr[123][123]; sysseek $fh, -length(), 1; syswrite $fh, $_; }
    (untested).

    Dave.

Re^2: Muy Large File
by gam3 (Curate) on Apr 12, 2005 at 17:36 UTC
    Another problem is with data that crosses the buffer.
    -- gam3
    A picture is worth a thousand words, but takes 200K.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://439199]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-24 08:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found