shell vs. filehandler

10basetom has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: shell vs. filehandler by friedo (Prior) on Feb 11, 2005 at 03:40 UTC
Common sense tells us that reading the file with Perl would be much faster than launching an entirely new process, reading the file, dumping it to STDOUT and then reading it from there. But just to be sure, whenever you want to find out which thing is faster, fire up Benchmark. `perl -MBenchmark -e 'timethese( 1000, { cat => sub { my $str = qx\|cat /usr/share/dict/words\|; }, read => sub { local $/; open my $f, "/usr/share/dict/words" or die $!; my $str = <$f>; } } )' Benchmark: timing 1000 iterations of cat, read... cat: 6 wallclock secs ( 1.69 usr 1.34 sys + 0.28 cusr 2.34 csys = 5.65 CPU) @ 330.03/s (n=1000) read: 1 wallclock secs ( 0.58 usr + 0.58 sys = 1.16 CPU) @ 862.07/s (n=1000)` [download]	[reply] [d/l]
Re^2: shell vs. filehandler by 10basetom (Novice) on Feb 11, 2005 at 05:38 UTC
thank you for the feedback, guys... i will go with the second method. before reading about benchmark, i timed the two methods using unix's time command. here are the results: `cat: real 0.3 user 0.1 sys 0.2 read: real 0.2 user 0.0 sys 0.1` [download]	[reply] [d/l]
Re^3: shell vs. filehandler by fauria (Deacon) on Feb 11, 2005 at 07:18 UTC
Hi. Due to the small number of instructions executed, both scripts are too small for a trustable benchmark. real measures how much time the program took to execute, user how much time took in user mode, and sys, how much in kernel mode. Notice that your system may be busy, so there may be many processes changing their context. This (which is part of the so called system entropy) may vary in a very short time, and as your scripts runs also for a very short period of time, entropy may affect them in different ways. This may cause the fastest script to appear as the lowest. Try to rewrite them opening a reasonable ammount of files, so you can compare times of execution more accurately.	[reply]
Re^2: shell vs. filehandler by CountZero (Bishop) on Feb 11, 2005 at 07:29 UTC
Indeed running a benchmark is the only way to be sure, but ... looks can be deceiving as repeatedely running a script which reads a file may "suffer" from caching effects in the OS; i.e. the first time you read the file may take a relatively long time and all reads thereafter may be from the cache and hence much faster. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]
Re: shell vs. filehandler by K_M_McMahon (Hermit) on Feb 11, 2005 at 03:40 UTC
I'm not going to comment on the speed of either of these because I honestly don't know. But, what I do know is that one of the wonderful things about perl is that it can be run on any system without recoding. With this in mind, I try to minimize the number of system (or backtick) calls I make. Since 'cat' is not a windows command, if you go with the first option your script will not run on a windows machine; however your second choice makes your script platform independent, and that is a wonderful thing. -Kevin	[reply]
Re: shell vs. filehandler by TedPride (Priest) on Feb 11, 2005 at 06:16 UTC
Isn't this what the read command was designed for? `open($handle, $fpath); read($handle, $str, $maxsize); close($handle);` [download] The advantage of this method is you're specifying the maximum size of the data in advance, so memory can be allocated all at once and reading isn't in chunks. I prefer this for files small enough to comfortably fit into available memory.	[reply] [d/l]
Re^2: shell vs. filehandler by halley (Prior) on Feb 11, 2005 at 14:39 UTC
(1) premature optimization (2) no error exception handling (3) uses a magic constant which may require maintenance (4) not immediately clear to an idiomatic perl programmer In more depth, (1) Don't try really hard to think about the time overhead of memory allocation in one step or many. That's the computer's job. Your job is to write an effective algorithm. If the FINISHED program runs too slowly, THEN figure out why. (2) When you use system calls, like `open()` and `read()`, then you should also check their returned values for any unexpected error conditions. When you use the high-level equivalents like `<ARGV>`, this is done for you in a generic, consistent way (die). (3) When I'm designing code, I am at the beginning of the lifetime of the application. Any hardcoded "maximum size" constants are based on my best guess at the time, but somebody else might need to increase the complexity of that config file, or add more words to the dictionary, or any other mission creep. Someone's going to have to go in and find out why the application can't check the spelling of any words starting with Z. Someone's going to have to modify (and re-distribute) the application. There's a reason Perl doesn't require explicit string and array length allocations. Length should not be your concern. (4) I use the `read()` function so seldom that I would have to look it up to see what the "maxsize" argument was, and then come to trust that if the file was shorter than this maximum, that it would indeed get read in its entirety. I already know that the slurp idiom works every time. I didn't count off for `use strict` compliance, but remember to identify or declare all those lexicals accurately. -- `[ e d @ h a l l e y . c c ]`	[reply] [d/l] [select]
Re: shell vs. filehandler by blazar (Canon) on Feb 11, 2005 at 09:22 UTC
If you're concerned about efficiency you may be interested in File::Slurp too...	[reply]
Re^2: shell vs. filehandler by belden (Friar) on Feb 11, 2005 at 21:27 UTC
Or use IO::All, which wraps File::Slurp. `#!/usr/bin/perl use IO::All; my $slurp = io->( $path )->slurp(); # slurp a file my $stdin = io->( '-' )->slurp(); # slurp STDIN` [download]	[reply] [d/l]