Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Filehandles and Arrays

by bitswitch (Scribe)
on May 08, 2001 at 02:43 UTC ( [id://78695]=perlquestion: print w/replies, xml ) Need Help??

bitswitch has asked for the wisdom of the Perl Monks concerning the following question:

Is there a cleaner, faster, and better way of opening a file
and inserting it into an array than doing this?

open(LAST,"last.txt") || die "$!"; @last = <LAST>; close LAST;

Replies are listed 'Best First'.
(ar0n) Re: Filehandles and Arrays
by ar0n (Priest) on May 08, 2001 at 02:45 UTC
    I really like:
    my @array = do { local @ARGV = "foo.txt"; <> };

    ar0n - swoosh ]

      Nice! Or, less flashy, you can just put your fav'rit routine into a sub that takes a filename as argument.

      sub slurp { my $filename = shift; open HEYNONNYNO, $filename or die "Can't open $filename: $!\n"; my @stuff = <HEYNONNYNO>; close HEYNONNYNO; [ @stuff ] }

      And you call it with my @lines = @{slurp("foo.txt.")}; Added the code tags to make that look better.

      Again, just for fun (I'm into that today).

      I played around with this some, and noticed that if you're reading a file off of the command line already (at least in the situations I could test) it doesn't work. Which, in my view, it should, owing to local and all. Specifically, it keeps right on reading off of the file specified on the command line, try as I might (by localizing @ARGV and $ARGV and even *ARGV, which was probably a bad idea) to dissuade it.

      Can any kind, wise, helpful monks explain why and when this will and won't work? It's a neat trick in any case, but if it could be worked as a general file-slurper without the baggage, it'd be even cooler.



      If God had meant us to fly, he would *never* have give us the railroads.
          --Michael Flanders

        Well, I'm not who you asked for so feel free to ignore me. I've said it before and it works for me: my @lines= do { local *ARGV; @ARGV= $name; <> }; I have tested it while in the middle of using <> to read from files given on the command line and it read the lines from the named file and then the next <> resumed right where it had left off.

        Perhaps you could post some code that demonstrates how it fails (as I was never completely sure that it was foolproof).

                - tye (but my friends call me "Tye")
Re: Filehandles and Arrays
by srawls (Friar) on May 08, 2001 at 06:05 UTC
    Well, I started to suggest the below code:
    open(FH,$file) or die "could not open $file\n"; push @someArray, $_ while <FH>;
    But, supprisingly (to me at least), your code ran faster everytime I benchmarked it. I made the file 2500 lines long, and repeated the function 100 times, thinking that this would push the odds in my favor (I seem to recall that file slurping isn't effecient on long files), but your code was faster everytime.

    The 15 year old, freshman programmer,
    Stephen Rawls
      virtualsue is right about why slurping large files is slow, but the above code will be slower for good reason.

      As a general rule, the more detailed the instructions that Perl gets, the slower it will be. The reason is that Perl is interpreted, and so it is constantly going back to your instructions, figuring out what to do next, and then doing that. But the more that Perl is getting instructions that allow it to "chunk" operations, the easier it is for Perl to do that efficiently.

      Think of yourself as perl and this becomes obvious. In the one case you are told to grab a hunk of data in lines, allocate an array, and shove the data there. In the other case you are told to open a file, scan in a line, alias that to $_, append to an array (do we need to allocate more for the array now?) etc.

      Which instructions involve more thinking? For computers thought is time...

        Thanks for your explanation. I definitely oversimplified above. In my defense, I did it because it bothered me that what I saw as the biggest opening for performance pain (file slurp) was being ignored. Having seen this sort of thing happen all too often in real life, I am possibly a little oversensitive. I get this image of a guy being rushed into an ER, blood spurting all over from some massive trauma, and telling the docs that he'd like them to look at his hangnail instead. ;)
      The problem with scarfing large files into arrays is the amount of memory required. The system eventually runs out, then slow stuff like paging & swapping occur. I suspect this is what underlies the 'slurp performance problem' to which you refer . All of the methods above, including yours, are 'guilty' of hogging RAM in the same way.

      I am not a Perl internals type, but I would expect that all of the program shards presented in this thread will boil down to much the same lower-level code; IOW there shouldn't be a significant difference in speed between them. If this is true (corrections cheerfully invited) then the clearest succinct method should be used if there is any chance at all that someone else will inherit your code.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://78695]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-24 01:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found