Re^2: One line assigment statement with regex match

When looking for short literals, index is more efficient than a regex:

my @matches = grep index( $lineFromSomeFile, $_ ) > -1, @terms;
[download]

Update: Added the link and the qualifier "short" in response to kaif's comment++. How short is short? When I tested random (but constant) strings and substrings of lengths 80 and 8, respectively, which are "typical" lengths for a line and a word, index was about 20% faster than the corresponding regex. I imagine that it is this sort of analysis that's responsible for the widespread reputation of index as being superior to regexes. Clearly, as kaif shows, the ratio of speeds is sensitive to the sizes of the string and the substring being searched, but I have not done a detailed analysis beyond this, and what is posted in the node linked above.

the lowliest monk

Comment on Re^2: One line assigment statement with regex match Download Code

Replies are listed 'Best First'.
Re^3: One line assigment statement with regex match by kaif (Friar) on Jun 23, 2005 at 01:59 UTC
So, a lot of people like to say that. And indeed, sometimes `index` is ten times faster. But sometimes it's more than three times slower! use Benchmark qw(:all); $text = <<EOF; aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa EOF $pattern = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"; cmpthese($count, { 'regex' => sub { $text =~ $pattern }, 'index' => sub { index $text, $pattern }, }); __DATA__ Rate index regex index 630601/s -- -67% regex 1914815/s 204% -- [download] Moreover, increasing the lengths of the text and pattern, I can make the regex be 40 times faster. See Re^8: "advanced" Perl functions and maintainability for reasons why people use regexes instead of `index`. Personally, I still don't understand why there even is a difference in speed -- shouldn't the regex engine be optimized to notice that this is a search for a constant string and then call the same function as `index`? : No, I'm not kidding. The output follows. Moreover, for this example, adding a single `study $text` is an extra 10 times faster, completely obliterating `index`. `Rate index regex study index 178/s -- -98% -100% regex 7538/s 4124% -- -92% study 98871/s 55311% 1212% --` [download] Update: I'm running perl v5.8.5 built for i686-linux.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: One line assigment statement with regex match
by kaif (Friar) on Jun 23, 2005 at 01:59 UTC

So, a lot of people like to say that. And indeed, sometimes index is ten times faster. But sometimes it's more than three times slower!

use Benchmark qw(:all);

$text = <<EOF;
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaa
EOF
$pattern = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb";

cmpthese($count, {
  'regex' => sub {       $text =~ $pattern },
  'index' => sub { index $text,   $pattern },
});
__DATA__
           Rate index regex
index  630601/s    --  -67%
regex 1914815/s  204%    --
[download]

Moreover, increasing the lengths of the text and pattern, I can make the regex be 40 times faster*. See Re^8: "advanced" Perl functions and maintainability for reasons why people use regexes instead of index. Personally, I still don't understand why there even is a difference in speed -- shouldn't the regex engine be optimized to notice that this is a search for a constant string and then call the same function as index?

*: No, I'm not kidding. The output follows. Moreover, for this example, adding a single study $text is an extra 10 times faster, completely obliterating index.

         Rate  index  regex  study
index   178/s     --   -98%  -100%
regex  7538/s  4124%     --   -92%
study 98871/s 55311%  1212%     --
[download]

Update

[reply]
[d/l]
[select]