in reply to Re^2: Get a known substring from a string
in thread Get a known substring from a string
I know it looks really trivial once you see it, but I'm really astonished by your approach of using 1+index(...) - it had not occurred to me to use index that way in an expression to check for presence. I'll add that to my set of idiosyncratic phrases, just like if( system(...) == 0 ) { for successful execution of subprocesses.
Update: I wondered about how much the capturing parentheses cost, and it seems they account for roughly a third half of the performance attainable when using the regex engine. Maybe the two additional steps executed in the regex engine (OPEN1 and CLOSE1) are to blame for that, as they effectively double the number of steps the regex engine has to execute for a successful match.
Not invoking the regex engine still is much faster, even though I had thought there once was an optimization that turned constant regular expressions without anchors or quantifiers into an index lookup...
# a: if( $s =~ m[(lazy)] ){ $found=$1 } Compiling REx "(lazy)" Final program: 1: OPEN1 (3) 3: EXACT <lazy> (5) 5: CLOSE1 (7) 7: END (0) anchored "lazy" at 0 (checking anchored) minlen 4 Matching REx "(lazy)" against "the quick brown fox jumps over the lazy + dog" Intuit: trying to determine minimum start position... Found anchored substr "lazy" at offset 35... (multiline anchor test skipped) try at offset... Intuit: Successfully guessed: match at offset 35 35 < the > <lazy dog> | 1:OPEN1(3) 35 < the > <lazy dog> | 3:EXACT <lazy>(5) 39 <the lazy> < dog> | 5:CLOSE1(7) 39 <the lazy> < dog> | 7:END(0) Match successful! Freeing REx: "(lazy)" # b: $found = 'lazy' if 1+index( $s, 'lazy' ); # c: if( $s =~ m[lazy] ){ $found=$& } Compiling REx "lazy" Final program: 1: EXACT <lazy> (3) 3: END (0) anchored "lazy" at 0 (checking anchored isall) minlen 4 Matching REx "lazy" against "the quick brown fox jumps over the lazy d +og" Intuit: trying to determine minimum start position... Found anchored substr "lazy" at offset 35... (multiline anchor test skipped) try at offset... Intuit: Successfully guessed: match at offset 35 Freeing REx: "lazy" Rate a c b a 2038631/s -- -50% -75% c 4089154/s 101% -- -49% b 8013601/s 293% 96% --
The program I used:
use strict; use Benchmark 'cmpthese'; use vars '$s'; $s='the quick brown fox jumps over the lazy dog'; my $found; my %benchmarks = ( a => q[ if( $s =~ m[(lazy)] ){ $found=$1 } ], b => q[ $found = 'lazy' if 1+index( $s, 'lazy' ); ], c => q[ if( $s =~ m[lazy] ){ $found=$& } ], ); { use re 'debug'; for (sort keys %benchmarks) { print "# $_: $benchmarks{$_}\n"; undef $found; my $code = eval qq{sub { $benchmarks{$_} } } or die "Couldn't compile benchmark $_: $@"; $code->(); $found eq 'lazy' or die "Unexpected results: [$found] vs. 'lazy'"; }; }; cmpthese( -1, \%benchmarks);
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Get a known substring from a string
by BrowserUk (Patriarch) on Sep 10, 2016 at 11:40 UTC | |
by flowdy (Scribe) on Sep 13, 2016 at 07:34 UTC | |
by BrowserUk (Patriarch) on Sep 13, 2016 at 12:05 UTC | |
by flowdy (Scribe) on Sep 14, 2016 at 19:03 UTC | |
by BrowserUk (Patriarch) on Sep 14, 2016 at 20:16 UTC | |
by Anonymous Monk on Sep 13, 2016 at 09:16 UTC |