This seems like an ideal application for Filter::Simple. Here's what I came up with.

The filter package

package My::Filter; use Filter::Simple; use Benchmark::Timer; our $t = Benchmark::Timer->new(); FILTER_ONLY regex => sub { $_ = "(?{{ \$My::Filter::t->start('$_') }})" . "(?:$_)" . "(?{{ \$My::Filter::t->stop('$_') }})"; }, ; sub report{ return $t->report() } 1;

This requires that you add one line at the top of your program, and one at the bottom/end to generate the report. It also requires the installation of Filter::Simple which has a dependancy on Text::Balanced, but they are both pure-Perl modules and install easily. It also uses benchmark::Timer, but I think this is a component of most standard distributions.

The test program

#! perl -slw use strict; use My::Filter; #<< Include the module my $stuff = 'abcdefghijklmnopqrstuvwxyz'; for (1..1000) { if ( $stuff =~ m[pqr] ) { $stuff =~ s/(\G(?:.{3})+?)(?<=...)(.)(.)/$1$3$2/g; } $_ = $stuff; my $OK = 1 if m[pqr]; } print $stuff; print '=' x 20, 'Timing of regexs in ', $0, '=' x 20; print My::Filter::report(); # << Generate report

Some sample output

C:\test>test abcdefghijklmnopqrstuvwxyz ====================Timing of regexs in C:\test\test.pl=============== +===== 2000 trials of pqr (290.000ms total), 145us/trial 5000 trials of (\G(?:.{3})+?)(?<=...)(.)(.) (820.000ms total), 164us/t +rial

Note: This is tested as far as you see it above. I intend to do much more testing and maybe package it up for CPAN if it proves usable and useful, but that may take some time.

A couple of caveats.

Filter::Simple seems to have trouble with s/// if you use two set of identical, balanced delimiters eg.  s[....][...]. If your using this style you may have to change your source slightly. Other limitations like this are bound to exists.

The filter actually embeds the timer code at the front and back of the regex itself. Even though the code is embedded using zero-width code assertions, it's quite possible--even likely--that their presence may change the meaning of some regexes. I haven't encountered one yet, but it could. If the output of your code changes, wrap the line suspected in no My::Filter; use My::Filter;. I haven't had occasion to tested this work-around yet.

It's worth pointing out that the code profiled is the regex itself. Not the statement it is a part of, nor even the whole s///. Only the left-hand side of these statements will be profiled. Hopefully, this is the most useful information anyway. It does seem to count and profile each iteration of those regexes using the /g modifier successfully.

Relating the regex back to the source code is currently a manual effort. Unfortunately, when the code is evaluated, the __LINE__ macro is not set:(. I haven't thought of a work-around for this yet. This has the unfortunate side-effect that Lexically identical regexes in different lines of the source get counted and timed as the same thing. This is usually fairly easy to work around, wrapping one of the m[pqr]'s in non-capturing brackets for instance m[(?:pqr)] will in most if not all cases, make no difference to the function of the regex, but allow them to be distinguished in the timings.

I haven't tested this with qr[...] style regexes yet.

If anyone has any suggestions for determining the line numbers at which the regexes appear, I'd be pleased to hear of them. Or anything else for that matter.


Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.


In reply to Re: Profiling regular expressions by BrowserUk
in thread Profiling regular expressions by Mur

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.