Re: File slurping efficiency
by KM (Priest) on Aug 11, 2000 at 22:20 UTC
|
I just did a quick test of opening a small file 500 times using method A, then grepping, and method B and using a match. Results of my benchmark:
Method A:
1 wallclock secs ( 0.16 usr + 0.02 sys = 0.18 CPU)
Method B:
0 wallclock secs ( 0.04 usr + 0.03 sys = 0.07 CPU)
Results may vary, try it yourself to see what you get. Not very surprising since Method A will need to populate an array, the loop through it and match. Method B just does the match.
UPDATE: Out of curiosity I did Method A withought building an array and simply grepping from the open filehandle. Small difference:
0 wallclock secs ( 0.12 usr + 0.03 sys = 0.15 CPU)
Cheers,
KM | [reply] |
Re: File slurping efficiency
by chromatic (Archbishop) on Aug 11, 2000 at 22:23 UTC
|
Arrays use more memory than scalars, and there will be some processing needed to split up a file into records (one record per slot in the array). However, it does take a bit of time to localize a magic variable and restore it. My suspicion is that method B is faster, but I don't have any benchmarks.
In cases like this, I usually go with whichever method makes parsing easier. If it's line based, I loop over the array (or use while on the filehandle). If I'm dealing with something that can span lines, I'll try the second (or redefine $/).
If you're only looking for the first instance of the string in the file, and you're only concerned with what else is on the line, I'd go with while. Without seeing your data or knowing more about it, it's hard to say more. | [reply] |
Re: File slurping efficiency
by Hot Pastrami (Monk) on Aug 12, 2000 at 05:20 UTC
|
Ovid:
It was the "Running with Scissors" topic that inspired me to add "Sitting calmly with scissors"... I was just being a smartass about my cautious approach to "things unknown" when it comes to Perl (can I say "smartass" here? If not, imagine I typed "wise-crackin' guy" instead of "smartass").
At the advice of yourself and gryng, I'll skip the study() for now and just match the string as is... your reasoning sounds pretty logical. Perhaps in the future I'll add it and try the benchmark module ZZamboni suggested, and see if it does me any good.
As far as the index file goes, I've used that in other scenarios, but it won't work here because these files aren't static... they're ever-changing. I think the overhead of constantly rebuilding the word indexes would outweigh the speedy search advantage. Thanks for tryin to help me out though, man... I appreciate it.
And thanks to Ovid, gryng, and ZZamboni, you guys are the tops!!
Alan "Hot Pastrami" Bellows
-Sitting calmly with scissors-
P.S. Hey Ovid, as a curiosity, the company I work for is called Ovid... I'm a Perl programmer there. Crazy stuff, eh? | [reply] |
Re: File slurping efficiency
by Hot Pastrami (Monk) on Aug 12, 2000 at 02:06 UTC
|
Thanks for the tips, guys... this Perl efficiency stuff where it's at. I have a related question now... If I went with method B, and ran the ultra-long string through a pattern match, would it be to my advantage to study() the string first, or will the string's length make it take too blasted long? I have never actually used the study() function before, so I don't know much about its impact.
If nobody replies to this, I'll take that to mean "get off your lazy butt, get a benchmark utility, and test it yourself, bozo!" That's perfectly reasonable, and very true, but I'll probably think the "bozo" was out of line.
Alan "Hot Pastrami" Bellows
-Sitting calmly with scissors- | [reply] |
|
|
First, I want to say that I laughed my head off when I read your sig line (have you seen Running With Scissors?).
You need to be very careful with study. To the best of my knowledge, it's always been very buggy. In fact, in later versions of Perl (not sure about 5.6), successful matches against $_ can fail if you're using study, even if the string your matching against isn't what you studied. Apparently, the only way to get around this is to explicitly undef the studied string as soon as you are done with it (see Mastering Regular Expressions, second edition, page 289).
If you're willing to risk the problems with study, you should go ahead and benchmark it, but I wouldn't bother with it, personally.
Cheers,
Ovid
| [reply] |
|
|
Hi Hot Pastrami,
study() (correct me if I'm wrong here guys) is only useful if you are going to search many keywords on the same string. Think of it as building an index of where all the a's and the b's, etc.. are located in the string so that if you need to see if 'airplane' is there, you can look really quickly (ok, so it doesn't do exactly that... but it's the same idea! :) ).
So if you are only going to look at a few keywords then don't bother with study, but if you are going to look at a few hundred keywords, then it might help alot more! (The exact cut off value depends on the length of the string, the length of the keywords, and the number of the keywords -- yes the length of the keywords effects the speed of a lookup, both ways).
Cheers,
Gryn
| [reply] |
|
|
You maybe know this already, but in case you don't: if you
want to do some benchmarking, make sure you check out the
Benchmark module, which is part of the Perl standard distribution.
--ZZamboni
| [reply] |
RE: File slurping efficiency
by Anonymous Monk on Aug 12, 2000 at 04:15 UTC
|
| [reply] |