vit has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
The bottleneck of my code is repeating operations
@arr = split(/\|/, $string); $#arr1 = -1; @arr1 = grep { /reg exp/ } @arr;
$string is long and I timed it and found that both split and grep in my case take the same time.
If you have any ideas about possible way to speed it up I will appreciate.
I guess I can use map here somehow but I am not sure.

Replies are listed 'Best First'.
Re: Performance optimization question
by Joost (Canon) on Apr 02, 2008 at 22:23 UTC
Re: Performance optimization question
by BrowserUk (Patriarch) on Apr 02, 2008 at 22:43 UTC

    A casual test showed a > 40% improvement by omitting the intermediate array:

    my @arr1 = grep { /reg exp/ } split /\|/, $string; Rate orig A orig 232/s -- -30% A 331/s 43% --

    And if the lack of mys and the need to set the length of the array in your code indicates you are using globals instead of lexicals, note that lexicals are usually a few percent faster.

    A lot will depend upon how long the string is, how many elements it splits into, the complexity of /reg exp/, and the proportion of elements beig excluded. More info might yield better responses.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      If you want it really fast get rid of the braces as well:

      my @arr1 = grep /reg exp/, split /\|/, $string;

        Yes. I forget how much difference that can make. Though joost's idea (with several modifications) works out fastest:

        #! perl -slw use strict; use Data::Dump qw[ pp ]; use Benchmark qw[ cmpthese ]; our $string = join '|', map{ join rand() < 0.2 ? 'fred' : 'bill', 'pqr', 'xyz' } 1 .. 10000; our $first = 0; our %counts; cmpthese -1, { orig => q[ my @arr = split(/\|/, $string); my @arr1 = grep { /fred/ } @arr; $counts{ orig } = @arr1; ], Buk1 => q[ my @arr1 = grep { /fred/ } split /\|/, $string; $counts{ Buk1 } = @arr1; ], jwkrahn => q[ my @arr1 = grep /fred/, split /\|/, $string; $counts{ jwkrahn } = @arr1; ], Buk2 => q[ my @arr1 = $string =~ m[(?:^|\|)(.*?fred.*?)(?=\||$)]g; $counts{ Buk2 } = @arr1; ], JOOST => q[ my @arr1 = $string =~ /(?:^|\|)([^|]*?fred[^|]*?)(?=\||$)/g; $counts{ JOOST } = @arr1; ], }; pp \%counts; __END__ c:\test>junk6 Rate orig Buk1 JOOST jwkrahn Buk2 orig 20.2/s -- -28% -48% -58% -84% Buk1 28.1/s 39% -- -28% -42% -77% JOOST 39.2/s 94% 39% -- -19% -68% jwkrahn 48.4/s 140% 72% 24% -- -61% Buk2 124/s 515% 342% 217% 157% -- { Buk1 => 2010, Buk2 => 2010, JOOST => 2010, jwkrahn => 2010, orig => +2010 }

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Performance optimization question
by ikegami (Patriarch) on Apr 02, 2008 at 22:41 UTC

    What's the regular expression in question?

    What's the code that uses this? A macro change is usually the way to go.

    $#arr1 = -1; is totally useless seeing how @arr1 is overwritten on the next line.

Re: Performance optimization question
by Anonymous Monk on Apr 02, 2008 at 22:32 UTC
    Other than the approaches suggested by Joost (probably the best), avoiding intermediate results might help.

    @arr1 = grep /reg exp/, split /\|/, $string;
Re: Performance optimization question
by moritz (Cardinal) on Apr 03, 2008 at 06:49 UTC
    If reg exp is meant literally, you can use index to search for the literal substring - it's faster than a regular expression.
        Are you sure the regex engine isn't "cheating" by caching something?
        #!/usr/bin/perl use strict; use warnings; use Time::HiRes qw(time); my $num = 3_000_000; my $str = 'x'x$num . 'reg exp' . 'x'x$num; my $start = time; my $res = $str =~ m/reg exp/; print time - $start, $/; $start = time; my $idx = index $str, 'reg exp'; print time - $start, $/; __END__ 0.0114099979400635 0.011760950088501

        Admittedly, index is still a bit slower, but the difference isn't that huge.

        BTW on my machine (with perl 5.8.8) the difference isn't there at all:

        Rate REGEX INDEX REGEX 87061/s -- -2% INDEX 88494/s 2% --

        The results only differ slightly for 5.10.0. Which perl did you use?

        I thought that index and regexes use the same algorithm, but the regex goes through the pain of compiling the regex first

        That's odd. Aren't they suppose to be using the same algorithm internally?
        General request|comment: Could we have a failing test too in benchmarks (at least for regex, index, and such)?
Re: Performance optimization question
by Joost (Canon) on Apr 02, 2008 at 22:21 UTC