bitswitch has asked for the wisdom of the Perl Monks concerning the following question:
Is there a cleaner, faster, and better way of opening a file
and inserting it into an array than doing this?
open(LAST,"last.txt") || die "$!";
@last = <LAST>;
close LAST;
(ar0n) Re: Filehandles and Arrays
by ar0n (Priest) on May 08, 2001 at 02:45 UTC
|
my @array = do { local @ARGV = "foo.txt"; <> };
[ ar0n - swoosh ]
| [reply] [d/l] |
|
sub slurp {
my $filename = shift;
open HEYNONNYNO, $filename or die "Can't open $filename: $!\n";
my @stuff = <HEYNONNYNO>;
close HEYNONNYNO;
[ @stuff ]
}
And you call it with my @lines = @{slurp("foo.txt.")}; Added the code tags to make that look better.
Again, just for fun (I'm into that today).
| [reply] [d/l] [select] |
|
I played around with this some, and noticed that if you're reading a file off of the command line already (at least in the situations I could test) it doesn't work. Which, in my view, it should, owing to local and all. Specifically, it keeps right on reading off of the file specified on the command line, try as I might (by localizing @ARGV and $ARGV and even *ARGV, which was probably a bad idea) to dissuade it.
Can any kind, wise, helpful monks explain why and when this will and won't work? It's a neat trick in any case, but if it could be worked as a general file-slurper without the baggage, it'd be even cooler.
If God had meant us to fly, he would *never* have give us the railroads.
--Michael Flanders
| [reply] [d/l] [select] |
|
Well, I'm not who you asked for so feel free to ignore me. I've said it before and it works for me:
my @lines= do { local *ARGV; @ARGV= $name; <> };
I have tested it while in the middle of using
<> to read from files given on the command
line and it read the lines from the named file and then the next <> resumed right where it had left off.
Perhaps you could post some code that demonstrates how it fails (as I was never completely sure that it was foolproof).
-
tye
(but my friends call me "Tye")
| [reply] [d/l] [select] |
|
|
|
Technically though, you've also messed up ARGV and $ARGV. The safest would be:
my @array = do { local *ARGV; @ARGV = "foo.txt"; <> };
-- Randal L. Schwartz, Perl hacker | [reply] [d/l] |
Re: Filehandles and Arrays
by srawls (Friar) on May 08, 2001 at 06:05 UTC
|
Well, I started to suggest the below code:
open(FH,$file) or die "could not open $file\n";
push @someArray, $_ while <FH>;
But, supprisingly (to me at least), your code ran faster everytime I benchmarked it. I made the file 2500 lines long, and repeated the function 100 times, thinking that this would push the odds in my favor (I seem to recall that file slurping isn't effecient on long files), but your code was faster everytime.
The 15 year old, freshman programmer,
Stephen Rawls | [reply] [d/l] |
|
virtualsue is right about why slurping large files is slow, but the above code will be slower for good reason.
As a general rule, the more detailed the instructions that Perl gets, the slower it will be. The reason is that Perl is interpreted, and so it is constantly going back to your instructions, figuring out what to do next, and then doing that. But the more that Perl is getting instructions that allow it to "chunk" operations, the easier it is for Perl to do that efficiently.
Think of yourself as perl and this becomes obvious. In the one case you are told to grab a hunk of data in lines, allocate an array, and shove the data there. In the other case you are told to open a file, scan in a line, alias that to $_, append to an array (do we need to allocate more for the array now?) etc.
Which instructions involve more thinking? For computers thought is time...
| [reply] |
|
Thanks for your explanation. I definitely
oversimplified above. In my defense, I did it because
it bothered me that what I saw as the biggest opening for
performance pain (file slurp) was being ignored. Having
seen this sort of thing happen all too often in real life,
I am possibly a little oversensitive. I get this image of
a guy being rushed into an ER, blood spurting all over from
some massive trauma, and telling the docs that he'd like them
to look at his hangnail instead. ;)
| [reply] |
|
The problem with scarfing large files into arrays is the
amount of memory required. The system eventually runs
out, then slow stuff like paging & swapping occur.
I suspect this is what underlies the 'slurp performance
problem' to which you refer . All of the methods above,
including yours, are 'guilty' of hogging RAM in the same way.
I am not a Perl internals type, but I would expect that
all of the program shards presented in this thread will
boil down to much the same lower-level code; IOW there
shouldn't be a significant difference in speed
between them. If this is true (corrections
cheerfully invited) then the clearest succinct method
should be used if there is any chance at all that someone
else will inherit your code.
| [reply] |
|
|