narainhere:
Hi monks, I would like to do a regex on a string, to see if it matches any of the strings in a array
After reading the solutions here, I was tempted to
post a second contribution (which I do now).
The topic is interesting because so many
ways exist to solve it. My question was then:
what are they worth in real code. Stressing the
point of the OP: "
to see if it matches any of the
strings in a array",
it's clear that the
grep/map-solutions do too much here.
But what about the regex-only solution?
Here's my first attempt to get behind that:
use strict;
use warnings;
use List::Util qw'first reduce';
use Benchmark qw'cmpthese timethese';
my @arr = qw' cool guy here ' x 1;
my $str = '100 WORDS ' x 50 . 'I am cool';
my @invocation=(0)x5;
my $results = timethese(0, {
'word_altn' => sub {
local $" = '|';
if( $str =~ /@arr/ ) {
++$invocation[0]; # print "Matched\n"
}
},
'block_grep' => sub {
if( grep { index($str, $_) !=-1 } @arr) {
++$invocation[1]; # print "Matched\n"
}
},
'expr_grep' => sub {
if( grep index($str, $_) !=-1, @arr) {
++$invocation[2]; # print "Matched\n"
}
},
'list_util_index' => sub {
if( first { index($str, $_) != -1 }, @arr ) {
++$invocation[3]; #print "Matched\n"
}
},
'list_util_regex' => sub {
if( first { $str =~ /$_/ }, @arr ) {
++$invocation[4]; #print "Matched\n"
}
}
}, 'none' );
print "@invocation\n";
die unless 5 == grep $_>0, @invocation;
cmpthese $results;
On my Linux box (5.8.8), this will produce:
42598 1157019 1349175 990717 1135313
Rate word_altn list_util_index list_util_regex blo
+ck_grep expr_grep
word_altn 10213/s -- -96% -97%
+ -97% -97%
list_util_index 274237/s 2585% -- -8%
+ -16% -20%
list_util_regex 299705/s 2835% 9% --
+ -8% -13%
block_grep 325087/s 3083% 19% 8%
+ -- -5%
expr_grep 342754/s 3256% 25% 14%
+ 5% --
On larger strings *or* larger comparison word arrays,
the List::Util based solutions will tend to win
by a margin.
Regards
mwa
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.