Hi Monks, as mentioned before, I am working on a
project that checks many thousands of links. One of the tests I do is try to match a very simple pattern to the HTML page fetched by the link. A very simple example (taken from the Alpaca book) to demonstrate:
$_ = "yabba dabba doo";
if (/abba/){
print "It matched!\n";
}
My question is, if the pattern is indeed only letters like the example above (no wildcards, character classes etc.), is regex matching the fastest way to go? Maybe grep matching or substring matching would be quicker?
Or is the regex engine smart enough to do faster matching for simple patterns?
Since I am checking 1000s of HTML pages and in some cases I want to match quite a few patterns anything that could make this run a bit faster would really help me out
Also I'm quite curious about the answer to this question
Any ideas?
UPDATE:
After trying kcott's helpful suggestion, I get this results (code below in one of my answers), so it seems regex is still faster:
Rate index regex
index 997/s -- -67%
regex 3004/s 201% --
Everybody seems to think I'm lazy
I don't mind, I think they're crazy
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.