Here's the program I wrote to characterize regular expressions. I dumped the output of -Mre=debug after compiling everything in perl and ran this program on that output. It turns out that it decides only three regexps *require* an NFA, 750 require at least a DFA, and 325 could be done with just a plain Boyer-Moore search.

Generate your data:

find src/bleadperl -name '*.pm' -o -name '*.pl' -print0 | xargs -n1 -0 + perl -Mre=debug > some-file.txt perl this-program some-file.txt

The program:

use strict; use warnings; my %sim; @sim{qw(BOL EXACT END NOTHING SBOL EOL SEOL)} = (); my %dfa; @dfa{qw(ALNUM ANYOF BOL BRANCH BOUND CLOSE1 CLOSE2 CLOSE3 CLOSE4 CLOSE +5 CLOSE6 CLOSE7 CURLY CURLYM CURLYN CURLYX DIGIT END EOL EOS EXACT EXACTF IFMATCH MBOL MEOL MINMOD NALNUM NDIGIT NOTHING NSPACE OPEN1 OPEN2 OPEN3 OPEN4 OPEN5 OPEN6 OPEN7 PLUS REG_ANY SANY SBOL SEOL SPACE STAR SUCCEED TAIL TRIE TRIEC UNLESSM WHILEM)}=(); sub any (&@) { my $predicate = shift @_; $predicate->() and return 1 for @_; return; } local $/; my $data = <>; my ( $bm_ct, $dfa_ct, $nfa_ct ); while ($data =~ /Final program:\n((?:\s[^\n]+\n)+)/mg) { my @ops = $1 =~ / ([A-Z]\w+)/g; if ( not any { not exists $sim{$_} } @ops ) { ++ $bm_ct; } elsif ( not any { not exists $dfa{$_} } @ops ) { ++ $dfa_ct; } else { ++ $nfa_ct; } } print "NFA $nfa_ct / DFA $dfa_ct / BM $bm_ct\n";

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊


In reply to Re^2: Perl regexp matching is slow?? by diotalevi
in thread Perl regexp matching is slow?? by smahesh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.