Filehandles and Arrays

bitswitch has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
(ar0n) Re: Filehandles and Arrays by ar0n (Priest) on May 08, 2001 at 02:45 UTC
I really like: `my @array = do { local @ARGV = "foo.txt"; <> };` [download] [ ar0n - swoosh ]	[reply] [d/l]
Re: (ar0n) Re: Filehandles and Arrays by arturo (Vicar) on May 08, 2001 at 03:05 UTC
Nice! Or, less flashy, you can just put your fav'rit routine into a sub that takes a filename as argument. `sub slurp { my $filename = shift; open HEYNONNYNO, $filename or die "Can't open $filename: $!\n"; my @stuff = <HEYNONNYNO>; close HEYNONNYNO; [ @stuff ] }` [download] And you call it with `my @lines = @{slurp("foo.txt.")};` Added the code tags to make that look better. Again, just for fun (I'm into that today).	[reply] [d/l] [select]
Re: (ar0n) Re: Filehandles and Arrays by ChemBoy (Priest) on May 08, 2001 at 04:11 UTC
I played around with this some, and noticed that if you're reading a file off of the command line already (at least in the situations I could test) it doesn't work. Which, in my view, it should, owing to local and all. Specifically, it keeps right on reading off of the file specified on the command line, try as I might (by localizing `@ARGV` and `$ARGV` and even `ARGV`, which was probably a bad idea) to dissuade it. Can any kind, wise, helpful monks explain why and when this will and won't work? It's a neat trick in any case, but if it could be worked as a general file-slurper without the baggage, it'd be even cooler. If God had meant us to fly, he would never* have give us the railroads. --Michael Flanders	[reply] [d/l] [select]
(tye)Re: Filehandles and Arrays by tye (Sage) on May 08, 2001 at 07:09 UTC
Well, I'm not who you asked for so feel free to ignore me. I've said it before and it works for me: `my @lines= do { local ARGV; @ARGV= $name; <> };` I have* tested it while in the middle of using `<>` to read from files given on the command line and it read the lines from the named file and then the next `<>` resumed right where it had left off. Perhaps you could post some code that demonstrates how it fails (as I was never completely sure that it was foolproof). - tye (but my friends call me "Tye")	[reply] [d/l] [select]
(tye)Re2: Filehandles and Arrays by tye (Sage) on May 08, 2001 at 19:07 UTC
Re: (tye)Re: Filehandles and Arrays by ChemBoy (Priest) on May 08, 2001 at 19:21 UTC
Re: (ar0n) Re: Filehandles and Arrays by merlyn (Sage) on May 08, 2001 at 18:28 UTC
Technically though, you've also messed up ARGV and $ARGV. The safest would be: `my @array = do { local *ARGV; @ARGV = "foo.txt"; <> };` [download] -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Re: Filehandles and Arrays by srawls (Friar) on May 08, 2001 at 06:05 UTC
Well, I started to suggest the below code: `open(FH,$file) or die "could not open $file\n"; push @someArray, $_ while <FH>;` [download] But, supprisingly (to me at least), your code ran faster everytime I benchmarked it. I made the file 2500 lines long, and repeated the function 100 times, thinking that this would push the odds in my favor (I seem to recall that file slurping isn't effecient on long files), but your code was faster everytime. The 15 year old, freshman programmer, Stephen Rawls	[reply] [d/l]
Re (tilly) 2: Filehandles and Arrays by tilly (Archbishop) on May 08, 2001 at 18:43 UTC
virtualsue is right about why slurping large files is slow, but the above code will be slower for good reason. As a general rule, the more detailed the instructions that Perl gets, the slower it will be. The reason is that Perl is interpreted, and so it is constantly going back to your instructions, figuring out what to do next, and then doing that. But the more that Perl is getting instructions that allow it to "chunk" operations, the easier it is for Perl to do that efficiently. Think of yourself as perl and this becomes obvious. In the one case you are told to grab a hunk of data in lines, allocate an array, and shove the data there. In the other case you are told to open a file, scan in a line, alias that to $_, append to an array (do we need to allocate more for the array now?) etc. Which instructions involve more thinking? For computers thought is time...	[reply]
Re: Re (tilly) 2: Filehandles and Arrays by virtualsue (Vicar) on May 10, 2001 at 02:33 UTC
Thanks for your explanation. I definitely oversimplified above. In my defense, I did it because it bothered me that what I saw as the biggest opening for performance pain (file slurp) was being ignored. Having seen this sort of thing happen all too often in real life, I am possibly a little oversensitive. I get this image of a guy being rushed into an ER, blood spurting all over from some massive trauma, and telling the docs that he'd like them to look at his hangnail instead. ;)	[reply]
Re: Re: Filehandles and Arrays by virtualsue (Vicar) on May 08, 2001 at 17:40 UTC
The problem with scarfing large files into arrays is the amount of memory required. The system eventually runs out, then slow stuff like paging & swapping occur. I suspect this is what underlies the 'slurp performance problem' to which you refer . All of the methods above, including yours, are 'guilty' of hogging RAM in the same way. I am not a Perl internals type, but I would expect that all of the program shards presented in this thread will boil down to much the same lower-level code; IOW there shouldn't be a significant difference in speed between them. If this is true (corrections cheerfully invited) then the clearest succinct method should be used if there is any chance at all that someone else will inherit your code.	[reply]


Do you know where your variables are?
	PerlMonks