Re^3: "advanced" Perl functions and maintainability

Replies are listed 'Best First'.
Re^4: "advanced" Perl functions and maintainability by itub (Priest) on Dec 13, 2004 at 05:18 UTC
Does it, really? As far as I can tell, the optimizations in the regular expression engine nicely cover the kinds of matches that could be done with `index`. Try running some benchmarks and see.	[reply] [d/l]
Re^5: "advanced" Perl functions and maintainability by William G. Davis (Friar) on Dec 13, 2004 at 13:07 UTC
I've done the benchmarks before. Here: #!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; my $string = "this is a string" x 300; my $short_string = "this is a short string"; cmpthese(10000000, { 'index' => sub { my $res; $res = index($string, "this", 0); $res = index($string, "string"); $res = rindex($string, "string"); }, regex => sub { my $res; $res = $string =~ /^this/; $res = $string =~ /string/; $res = $string =~ /string$/; }, shortindex => sub { my $res; $res = index($short_string, "this", 0); $res = index($short_string, "short"); $res = rindex($short_string, "string"); }, shortregex => sub { my $res; $res = $short_string =~ /^this/; $res = $short_string =~ /short/; $res = $short_string =~ /string$/; }, substrindex => sub { my $res; my $substr = "this is"; $res = index($short_string, $substr); }, substrregex => sub { my $res; my $substr = "this is"; $res = $short_string =~ /\Q$substr\E/; } }); [download] Benchmark: timing 10000000 iterations of index, regex, shortindex, sho +rtregex, substrindex, substrregex... index: 18 wallclock secs (17.29 usr + 0.00 sys = 17.29 CPU) @ 57 +8369.00/s (n=10000000) regex: 32 wallclock secs (32.52 usr + 0.00 sys = 32.52 CPU) @ 30 +7503.08/s (n=10000000) shortindex: 15 wallclock secs (15.15 usr + 0.00 sys = 15.15 CPU) @ 66 +0066.01/s (n=10000000) shortregex: 26 wallclock secs (27.14 usr + 0.00 sys = 27.14 CPU) @ 36 +8459.84/s (n=10000000) substrindex: 9 wallclock secs ( 9.39 usr + 0.00 sys = 9.39 CPU) @ 1 +064962.73/s (n=10000000) substrregex: 17 wallclock secs (16.64 usr + 0.00 sys = 16.64 CPU) @ 6 +00961.54/s (n=10000000) Rate regex shortregex index substrregex shortindex s +ubstrindex regex 307503/s -- -17% -47% -49% -53% + -71% shortregex 368460/s 20% -- -36% -39% -44% + -65% index 578369/s 88% 57% -- -4% -12% + -46% substrregex 600962/s 95% 63% 4% -- -9% + -44% shortindex 660066/s 115% 79% 14% 10% -- + -38% substrindex 1064963/s 246% 189% 84% 77% 61% + -- [download] The regex is almost always slower, but usually not by that much. janitored by ybiC: Replaced almost-allways-inappropriate <pre> tags around benchmark results with <code> tags, to avoid annoying lateral scrolling	[reply] [d/l] [select]
Re^6: "advanced" Perl functions and maintainability by Anonymous Monk on Dec 13, 2004 at 14:13 UTC
Quite inconclusive. Considering the what the optimizer does, it highly depends on your data whether index() or a regex is faster. It also depends whether there is a match, where the match is (if any), and whether the string has been studied. Here's some more data: #!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; our $string = "abcd" x 1000; $string .= "e"; $string .= "abcd" x 1000; our $study = $string; our $pass = "abcde"; our $fail1 = "foo12"; our $fail2 = "abdce"; study $study; cmpthese(-1, { index_pass => 'index($string, $pass)', regex_pass => '$string =~ /$pass/', study_pass => '$study =~ /$pass/', }); print ("\n\n"); cmpthese(-1, { index_fail1 => 'index($string, $fail1)', index_fail2 => 'index($string, $fail2)', regex_fail1 => '$string =~ /$fail1/', regex_fail2 => '$string =~ /$fail2/', study_fail1 => '$study =~ /$fail1/', study_fail2 => '$study =~ /$fail2/', }); __END__ Rate index_pass study_pass regex_pass index_pass 38331/s -- -6% -69% study_pass 40960/s 7% -- -67% regex_pass 125463/s 227% 206% -- Rate index_fail2 study_fail2 index_fail1 regex_fail2 +regex_fail1 study_fail1 index_fail2 27306/s -- -0% -48% -56% + -64% -99% study_fail2 27307/s 0% -- -48% -56% + -64% -99% index_fail1 52608/s 93% 93% -- -15% + -31% -98% regex_fail2 61837/s 126% 126% 18% -- + -19% -98% regex_fail1 75918/s 178% 178% 44% 23% + -- -98% study_fail1 3412032/s 12396% 12395% 6386% 5418% + 4394% -- [download] Note that with this data, index is slower than a regex. The fastness of 'study_fail1' is explained by the fact that the string we are looking for, 'foo12', contains letters not present in the string - and since a studied string has a histogram attacked of its letter frequencies, no searching needs to be performed at all.	[reply] [d/l]
Re^7: "advanced" Perl functions and maintainability by William G. Davis (Friar) on Dec 13, 2004 at 15:11 UTC
Re^8: "advanced" Perl functions and maintainability by itub (Priest) on Dec 13, 2004 at 16:37 UTC
Re^8: "advanced" Perl functions and maintainability by Anonymous Monk on Dec 13, 2004 at 15:43 UTC
Some notes below your chosen depth have not been shown here
Re^6: "advanced" Perl functions and maintainability by Aristotle (Chancellor) on Dec 20, 2004 at 03:51 UTC
Your benchmark seems quite confused and, well, all but useful. Your distinction between a short and long string is useless and deceptive because there are matches near both ends of the test strings for all of the searches you run, and there are no non-match data sets at all. `index( $foo, $bar, 0 )` is no different from `index( $foo, $bar )`, neither of which does the same as `/^$bar/`, just like `rindex( $foo, $bar )` is something entirely different from `/$bar$/`. You are comparing apples and meteors. Note that putting multiple different benchmarks in a single table only serves the purpose of casting further confusion onto the data. You are benchmarking three things; run three benchmarks, look at three tables. `use re 'debug';` and compile a few regexen sometime, you'll see that the regex engine turns the trivial cases into pretty much a plain index. There's more fixed overhead for invoking the engine rather than just calling that function, of course, but on large data sets that's negligible. Makeshifts last the longest.	[reply] [d/l]