Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

How can you sysread an entire file?

by NeilF (Sexton)
on Jan 12, 2006 at 19:51 UTC ( [id://522791]=perlquestion: print w/replies, xml ) Need Help??

NeilF has asked for the wisdom of the Perl Monks concerning the following question:

Other than reading for a very long length (beyond the length that a file could ever be) what is the best way to read in a file. In my case the file could be anywhere upto around 1-5 meg big!

This works, but is obviously not great:-
my $rec; sysopen(DF, "test.txt", O_RDONLY); sysread DF, $rec, 26214400; #25meg close DF;

One further question, if that variable ($rec) has thousands of records which end in CR, what's the best way to split it up into an array (retaining the carriage returns)? ie: Anything better than simply doing this after the read?

foreach(split("\n",$rec)){push(@array,"$_\n";)} $rec=''; # Release memory

Could it be cleverly combined into the read itself therefore meaning you don't have to use $rec and @array, and instead you could read it (ready split) straight into @array?

Replies are listed 'Best First'.
Re: How can you sysread an entire file?
by ikegami (Patriarch) on Jan 12, 2006 at 20:15 UTC

    sysread(DF, $rec, -s DF); will save you from specifying an arbitrary number.

    foreach(split("\n",$rec)){push(@array,"$_\n";)} causes three copies of the file to be in memory at once. ($rec, foreach's list and @array).

    push(@array, $1) while $rec =~ /(.*?\n)/g; will save you from having an third copy of the file in memory, and will save you from re-adding the "\n". Caveat: It will skip the last line if it isn't terminated by a "\n".

    Unfortunately, splitting on "\n" is not portable. Keep in mind that sysread has a bug causing it to never translate CRLF to LF in Windows.

    Is push(@array, $_) while <DF>; really that much slower?

    Any why not use undef $rec; instead of $rec = '';? I prefer to just use curlies, though: my @array; { my $rec; ... }

      Is push(@array, $_) while <DF>; really that much slower?
      Why not just @array = <DF> or push @array, <DF> ?
      Hmm.. as i was about to hit 'Create' i think i answered my own question -- those two make a copy of the whole <DF> array first, right? where your push/while avoids it ...


      Also, to OP's first part, you can use Perl Idioms Explained - my $string = do { local $/; <FILEHANDLE> }; to slurp the whole file ...
        Unfortunately I don't think I can use that "idioms" as I really do need to use sysread... (Due to IO operations being counted by ISP.)

        my $string = do { local $/; <FILEHANDLE> };
        looks nicer than
        my $string; { local $/; $string = <FILEHANDLE> }
        but takes twice as much memory. That's probably not that wise when dealing with entire files.

      Thanks...

      Not sure if I fully understand what you mean by "\n" is not portable? What exactly is the implication. I've run some tests and it seems to work OK on the files I've used it on (Windows XP & Unix).

      My ISP counts IO operations. Doing a "while <DF>" would mean an IO operation (I believe) for every 512 bytes of data as it is buffered in. By using sysread the whole file is read in by one IO operation. Hence me going down this approach...
        try this under windows and linux
        use strict; use warnings; open(FILE, '>', 'foo.bar') or die "Daim!"; print FILE "\n"; close FILE;
        under windows, the file will have a size of two bytes (\r\n)- under linux just 1 (\n) comapre it to
        use strict; use warnings; open(FILE, '>', 'foo.bar') or die "Daim!"; binmode FILE; print FILE "\n"; close FILE;
Re: How can you sysread an entire file?
by ChemBoy (Priest) on Jan 13, 2006 at 00:59 UTC

    If you really want to put it all in an array at once, use this: @array = split /(?<=\r)/, $rec; If you don't have a compelling need to do that, though, then I would recommend processing it line by line:

    while ($rec =~ /([^\r]+\r?)/g) { frobulate($1); }

    Both of the above assume that you mean a carriage return, not a newline, when you say carriage return (and that you're not running this on a machine running Mac OS 9 or lower).



    If God had meant us to fly, he would *never* have given us the railroads.
        --Michael Flanders

      Why do you recommend the second solution out of interest?

      I mean line ends in "\n";

      Sorry, but what on earth is "frobulate($1)"? I've never seen that before!?

        I would recommend it because it's the least memory-intensive approach, and you're potentially using a lot of memory already. (Best would of course be to read the file line-by-line, but since you can't do that, this is the next best thing I could come up with.)

        As to the other, you didn't stipulate what you wanted to do with the records when you had them split, and I needed a dummy subroutine to show the outline right (and to note that your data is found in $1). Blame the chatterbox for the odd word choice. :-)



        If God had meant us to fly, he would *never* have given us the railroads.
            --Michael Flanders

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://522791]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-23 07:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found