Performance optimization question

vit has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Performance optimization question by Joost (Canon) on Apr 02, 2008 at 22:23 UTC
I'd probably try a regex only, but I'm not sure it will improve matters. This code assumes /reg exp/ does not match \| characters. `my @results = $string =~ /(?:^\|\\|)[^\|](reg exp)[^\|](?:\\|\|$)/g;` [download] update: adjusted regex. also: sorry for double-post. please reap the above node if you can. the site is amazingly slow today and I can't seem to reach it. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l]
Re: Performance optimization question by BrowserUk (Patriarch) on Apr 02, 2008 at 22:43 UTC
A casual test showed a > 40% improvement by omitting the intermediate array: `my @arr1 = grep { /reg exp/ } split /\\|/, $string; Rate orig A orig 232/s -- -30% A 331/s 43% --` [download] And if the lack of mys and the need to set the length of the array in your code indicates you are using globals instead of lexicals, note that lexicals are usually a few percent faster. A lot will depend upon how long the string is, how many elements it splits into, the complexity of `/reg exp/`, and the proportion of elements beig excluded. More info might yield better responses. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^2: Performance optimization question by jwkrahn (Abbot) on Apr 03, 2008 at 00:58 UTC
If you want it really fast get rid of the braces as well: `my @arr1 = grep /reg exp/, split /\\|/, $string;` [download]	[reply] [d/l]
Re^3: Performance optimization question by BrowserUk (Patriarch) on Apr 03, 2008 at 01:20 UTC
Yes. I forget how much difference that can make. Though joost's idea (with several modifications) works out fastest: #! perl -slw use strict; use Data::Dump qw[ pp ]; use Benchmark qw[ cmpthese ]; our $string = join '\|', map{ join rand() < 0.2 ? 'fred' : 'bill', 'pqr', 'xyz' } 1 .. 10000; our $first = 0; our %counts; cmpthese -1, { orig => q[ my @arr = split(/\\|/, $string); my @arr1 = grep { /fred/ } @arr; $counts{ orig } = @arr1; ], Buk1 => q[ my @arr1 = grep { /fred/ } split /\\|/, $string; $counts{ Buk1 } = @arr1; ], jwkrahn => q[ my @arr1 = grep /fred/, split /\\|/, $string; $counts{ jwkrahn } = @arr1; ], Buk2 => q[ my @arr1 = $string =~ m[(?:^\|\\|)(.?fred.?)(?=\\|\|$)]g; $counts{ Buk2 } = @arr1; ], JOOST => q[ my @arr1 = $string =~ /(?:^\|\\|)([^\|]?fred[^\|]?)(?=\\|\|$)/g; $counts{ JOOST } = @arr1; ], }; pp \%counts; __END__ c:\test>junk6 Rate orig Buk1 JOOST jwkrahn Buk2 orig 20.2/s -- -28% -48% -58% -84% Buk1 28.1/s 39% -- -28% -42% -77% JOOST 39.2/s 94% 39% -- -19% -68% jwkrahn 48.4/s 140% 72% 24% -- -61% Buk2 124/s 515% 342% 217% 157% -- { Buk1 => 2010, Buk2 => 2010, JOOST => 2010, jwkrahn => 2010, orig => +2010 } [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^4: Performance optimization question by jwkrahn (Abbot) on Apr 03, 2008 at 03:41 UTC
Re^5: Performance optimization question by BrowserUk (Patriarch) on Apr 03, 2008 at 04:36 UTC
Re^4: Performance optimization question by vit (Friar) on Apr 03, 2008 at 02:06 UTC
Re: Performance optimization question by ikegami (Patriarch) on Apr 02, 2008 at 22:41 UTC
What's the regular expression in question? What's the code that uses this? A macro change is usually the way to go. `$#arr1 = -1;` is totally useless seeing how `@arr1` is overwritten on the next line.	[reply] [d/l] [select]
Re: Performance optimization question by Anonymous Monk on Apr 02, 2008 at 22:32 UTC
Other than the approaches suggested by Joost (probably the best), avoiding intermediate results might help. `@arr1 = grep /reg exp/, split /\\|/, $string;` [download]	[reply] [d/l]
Re: Performance optimization question by moritz (Cardinal) on Apr 03, 2008 at 06:49 UTC
If `reg exp` is meant literally, you can use index to search for the literal substring - it's faster than a regular expression.	[reply] [d/l]
Re^2: Performance optimization question by BrowserUk (Patriarch) on Apr 03, 2008 at 06:58 UTC
Not always: `$s = 'x'x3000 . 'reg exp' . 'x'x3000; cmpthese -1, { REGEX => q[ $x = $s =~ /reg exp/;], INDEX => q[ $x = index $s, 'reg exp';] };; Rate INDEX REGEX INDEX 40470/s -- -74% REGEX 156392/s 286% --` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^3: Performance optimization question by moritz (Cardinal) on Apr 03, 2008 at 07:55 UTC
Are you sure the regex engine isn't "cheating" by caching something? `#!/usr/bin/perl use strict; use warnings; use Time::HiRes qw(time); my $num = 3_000_000; my $str = 'x'x$num . 'reg exp' . 'x'x$num; my $start = time; my $res = $str =~ m/reg exp/; print time - $start, $/; $start = time; my $idx = index $str, 'reg exp'; print time - $start, $/; __END__ 0.0114099979400635 0.011760950088501` [download] Admittedly, index is still a bit slower, but the difference isn't that huge. BTW on my machine (with perl 5.8.8) the difference isn't there at all: `Rate REGEX INDEX REGEX 87061/s -- -2% INDEX 88494/s 2% --` [download] The results only differ slightly for 5.10.0. Which perl did you use? I thought that index and regexes use the same algorithm, but the regex goes through the pain of compiling the regex first	[reply] [d/l] [select]
Re^4: Performance optimization question by BrowserUk (Patriarch) on Apr 03, 2008 at 08:09 UTC
Re^3: Performance optimization question by ikegami (Patriarch) on Apr 03, 2008 at 07:16 UTC
That's odd. Aren't they suppose to be using the same algorithm internally?	[reply]
Re^4: Performance optimization question by BrowserUk (Patriarch) on Apr 03, 2008 at 07:43 UTC
Re^3: Performance optimization question by parv (Parson) on Apr 03, 2008 at 08:03 UTC
General request\|comment: Could we have a failing test too in benchmarks (at least for regex, index, and such)?	[reply]
Re^4: Performance optimization question by moritz (Cardinal) on Apr 03, 2008 at 08:18 UTC
Re^5: Performance optimization question by vit (Friar) on Apr 03, 2008 at 16:50 UTC
Re: Performance optimization question by Joost (Canon) on Apr 02, 2008 at 22:21 UTC
please ignore this post, and see my post below. sorry for double posting. I'd probably try a regex only, but I'm not sure it will improve matters. This code assumes /reg exp/ does not match \| characters and that $string doesn't start with a \|: `my @results = $string =~ /(reg exp)(?:\\|\|$)/g;` [download] "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l]