in reply to Re^2: Style question: regex versus string builtin function
in thread Style question: regex versus string builtin function

Well, you must be careful when you use match variables, especially when you work with big strings. But they aren't slow per se:

Update: Thank you all for your comments and suggestions (here and in the CB)! See How do I get what is to the left of my match? for an updated benchmark and better explanations.

use Benchmark qw(:all); use Regexp::MatchContext; my $count = 300000; my $seq = 'GGGTTGAAGTTTAGACCGCTCACAGTAGTTCTACCTATAGAAAAGATCATGAAAGAGGCGATC +AGAATGGTACTCGAATCCATTTACGATCCCGAGTTTCCAGACACATCGCATTTCCGCTCGGGTCAAGGC +TGCCACTCGGTCCTAAGACGGATCAAAGAAGAGTGGGGAATCTCTCGCTGGTTTTTAGAATTCGACATC +AGGAAGTGTTTTCACACCATCGACCGACATCGACTCATCCAAATTTTGAAGGAAGAGATCGACGATCCC +AAGTTCTTTTACTCCATTCAGAAAGTATTTTCCGCCGGACGACTCGTAGGAGTTGAGAGGGGCCCTTAC +TCCGTCCCACACAGTGTACTACTATCGGCCCTACCAGGCAACATCTACCTACACAAGCTCGATCAGGAG +ATAGGGAGGATCCGACAGAAGTACGAAATTCCGATTGTTCAGAGAGTCAGATCGGTTCTATTAAGGACA +GGTCGTCGTATTGATGACCAAGAAAACCCTGGAGAAGAAGCAAGCTTCAACGCTCCCCAAGACAACAGA +GCCATCATTGTGGGGAGCGTTAAGAGCATGCAACGCAAAGCGGCCTTTCATTCCCTTGTTTCGTCGTGG +CACACCCCCCCCACAAGCACCCTCCGGCTCAGGGGGGACCAGAAAAGGCCTTTCGTTTTCCCCCCTTCG +TCGGCCCTTGCCGTCTTCCTTAACAAGCCCTCGAGCCTTCTTTGCGCCGCCTTCCTCATAGAAGCCGCC +GGGTTGACCCCGAAGGCTGAATTCTATGGTGGAGAACGCTGTAATAATAATTGGGCCATGAGAGACCTT +CTTAAGTATTGCAAAAGAAAGGGCCTGCTGATAGAGCTGGGCGGGGAGGCGATACTAGTTATCAGGTCA +GAGAGAGGCCTGGCCCGTAAGCAGGCCCCCTTAAAAACCCATTACTTAATAAGGATTTGTTACGCGCGA +TATGCCGACGACTTACTACTGGGAATCGTGGGTGCCGTAGAGCTTCTCATAGAAATACAAAAACGTATC +GCCCATTTCCTACAATCTGGCCTGAACCTTTGGGTAGGCTCCGCAGGATCAACAACAATAGCTGCACGG +AGTACGGTAGAATTCCTTGGTACGGTCATTCGGGAAGTCCCTCCGAGGACGACTCCCATACAATTTTTG +CGAGAGCTGGAAAAGCGTCTACGGGTAAAGCACCGTATCCATATAACTGCTTGCCACCTACGCTCCGCC +ATCCATTCAAAGTTTAGGAACCTAGGTGATAGTATCCCGATCAAACAGCTGACGAAGGGGATGAGCAAA +ACAGGGAGTCTACAGGACGGGGTTCAACTAGCGGAGACTCTTGGAACAGCTGGAGTCAGAAGTCCCCAA +GTTAGCGTATTATGGGGGACCGTCAAGCACATCCGGCAAGGATCAAGGGGGATCTCGTTCTTGCATAGC +TCAGGTCGGAGCAACGCGTCATCGGACGTTCAACAGGTAGTCTCACGATCGGGCACTCATGCCCGTAAG +TTGTCATTGTATACTCCCCCGGGTCGGAAGGCGGCGGGGGAGGGAGGAGGACACTGGGCGGGATCTATC +AGCAGCGAATTCCCCATAAAGATAGAGGCACCTATAAAAAAGATACTCCGAAGGCTTCGGGATCGAGGT +ATCATTAGCCGAAGAAGACCCTGGCCAATCCACGTGGCCTGTTTGACGAACGTCAGCGACGAAGACATC +GTAAATTGGTCCGCGGGCATCGCGATAAGTCCTCTGTCCTACTACAGGTGCCGCGACAACCTTTATCAA +GTCCGAACGATTGTCGACCACCAGATTCGCTGGTCTGCAATATTCACCCTAGCCCACAAGCACAAATCC +TCGGCGCCGAATATAATCCTCAAGTACTCCAAAGACTCAAATATTGTAAATCAAGAAGGTGGCAAGATC +CTTGCAGAGTTCCCCAACAGCATAGAGCTTGGGAAGCTCGGACCCGGTCAAGACCTGAACAAGAAGGAA +CACTCAACTACTAGTCTAGTCTAG'; cmpthese( $count, { 'regex' => sub { my ( $prematch, $match, $postmatch ) = $seq =~ m{(\A .*?) (CTGGCCCGTAA) (.*\z) }xms; + }, 'matchvars' => sub { $seq =~ m{CTGGCCCGTAA}xms; my ($prematch, $match, $postmatch) = ($`, $&, $'); }, 'matchcontext' => sub { $seq =~ m{(?p)CTGGCCCGTAA}xms; my ( $prematch, $match, $postmatch ) = ( PREMATCH(), MATCH(), POSTMATCH() ); }, } );
Rate regex matchcontext matchvars regex 17065/s -- -24% -84% matchcontext 22590/s 32% -- -78% matchvars 103806/s 508% 360% --

Replies are listed 'Best First'.
Re^4: Style question: regex versus string builtin function
by ikegami (Patriarch) on Oct 02, 2007 at 14:15 UTC

    You seem to have missed the point. The problem with $`, $&, $' is not that it slows down a regex, it's that it slows down *all* regexs that have no captures, say like the one in matchcontext. Your Benchmark is useless.

      No, it is not useless. Comment out the match variables in matchvars. It is not really "considerable slower" with that small strings. I just wanted to say that match variables are not as evil as everybody says and that they CAN speed things significantly up (like 500% in the benchmark). But you must be careful.
Re^4: Style question: regex versus string builtin function
by eyepopslikeamosquito (Archbishop) on Oct 02, 2007 at 13:38 UTC

    Well, you must be careful when you use match variables, especially when you work with big strings. But they aren't slow per se
    From the Devel::SawAmpersand docs:

    There's a global variable in the perl source, called PL_sawampersand. It gets set to true in that moment in which the parser sees one of $`, $', and $&. It never can be set to false again. Trying to set it to false breaks the handling of the $`, $&, and $' completely.

    If the global variable PL_sawampersand is set to true, all subsequent RE operations will be accompanied by massive in-memory copying, because there is nobody in the perl source who could predict, when the (necessary) copy for the ampersand family will be needed. So all subsequent REs are considerable slower than necessary.