Re: Filehandles and Arrays

Well, I started to suggest the below code:

open(FH,$file) or die "could not open $file\n";
push @someArray, $_ while <FH>;
[download]

But, supprisingly (to me at least), your code ran faster everytime I benchmarked it. I made the file 2500 lines long, and repeated the function 100 times, thinking that this would push the odds in my favor (I seem to recall that file slurping isn't effecient on long files), but your code was faster everytime.

The 15 year old, freshman programmer,
Stephen Rawls

Comment on Re: Filehandles and Arrays Download Code

Replies are listed 'Best First'.
Re (tilly) 2: Filehandles and Arrays by tilly (Archbishop) on May 08, 2001 at 18:43 UTC
virtualsue is right about why slurping large files is slow, but the above code will be slower for good reason. As a general rule, the more detailed the instructions that Perl gets, the slower it will be. The reason is that Perl is interpreted, and so it is constantly going back to your instructions, figuring out what to do next, and then doing that. But the more that Perl is getting instructions that allow it to "chunk" operations, the easier it is for Perl to do that efficiently. Think of yourself as perl and this becomes obvious. In the one case you are told to grab a hunk of data in lines, allocate an array, and shove the data there. In the other case you are told to open a file, scan in a line, alias that to $_, append to an array (do we need to allocate more for the array now?) etc. Which instructions involve more thinking? For computers thought is time...	[reply]
Re: Re (tilly) 2: Filehandles and Arrays by virtualsue (Vicar) on May 10, 2001 at 02:33 UTC
Thanks for your explanation. I definitely oversimplified above. In my defense, I did it because it bothered me that what I saw as the biggest opening for performance pain (file slurp) was being ignored. Having seen this sort of thing happen all too often in real life, I am possibly a little oversensitive. I get this image of a guy being rushed into an ER, blood spurting all over from some massive trauma, and telling the docs that he'd like them to look at his hangnail instead. ;)	[reply]
Re: Re: Filehandles and Arrays by virtualsue (Vicar) on May 08, 2001 at 17:40 UTC
The problem with scarfing large files into arrays is the amount of memory required. The system eventually runs out, then slow stuff like paging & swapping occur. I suspect this is what underlies the 'slurp performance problem' to which you refer . All of the methods above, including yours, are 'guilty' of hogging RAM in the same way. I am not a Perl internals type, but I would expect that all of the program shards presented in this thread will boil down to much the same lower-level code; IOW there shouldn't be a significant difference in speed between them. If this is true (corrections cheerfully invited) then the clearest succinct method should be used if there is any chance at all that someone else will inherit your code.	[reply]