comment on

In a quick test, index doesn't seem to be quicker than simple regex. example:

$ perl -MTime::HiRes -MBenchmark=timethese -le'open F,"</usr/share/dic
+t/words";@words=<F>;chomp for @words; $re= q(^bonjour); timethese(-1,
+ { index => sub { for (@words) { return 1 if index($_,"bonjour") != -
+1 } }, re => sub { for (@words) { return 1 if /\bbonjour\b/ } }, q(re
+^) => sub { for (@words) { return 1 if /^bonjour/ } }, re_comp => sub
+ { $re= qr/^bonjour/o; for (@words) { return 1 if /$re/o } }, grep =>
+ sub { return 1 if grep /bonjour/, @words } } )'
Benchmark: 
running
 grep, index, re, re^, re_comp
 for at least 1 CPU seconds
...

      grep:  2 wallclock secs ( 1.08 usr +  0.00 sys =  1.08 CPU) @ 32
+.41/s (n=35)

     index:  1 wallclock secs ( 1.12 usr +  0.02 sys =  1.14 CPU) @ 26
+7.54/s (n=305)

        re:  1 wallclock secs ( 1.10 usr +  0.02 sys =  1.12 CPU) @ 27
+2.32/s (n=305)

       re^:  1 wallclock secs ( 1.14 usr +  0.00 sys =  1.14 CPU) @ 32
+7.19/s (n=373)

   re_comp:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 26
+8.27/s (n=279)
[download]

In this test anchoring the regex with '^' boost a little (+20%), and index or compiling the regex doesn't help. As you said, profiling the code can help here.

In reply to Re^2: how to speed up pattern match between two files by gnujsa
in thread how to speed up pattern match between two files by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.