Here's the program I wrote to characterize regular expressions. I dumped the output of -Mre=debug after compiling everything in perl and ran this program on that output. It turns out that it decides only three regexps *require* an NFA, 750 require at least a DFA, and 325 could be done with just a plain Boyer-Moore search.
Generate your data:
find src/bleadperl -name '*.pm' -o -name '*.pl' -print0 | xargs -n1 -0 + perl -Mre=debug > some-file.txt perl this-program some-file.txt
The program:
use strict; use warnings; my %sim; @sim{qw(BOL EXACT END NOTHING SBOL EOL SEOL)} = (); my %dfa; @dfa{qw(ALNUM ANYOF BOL BRANCH BOUND CLOSE1 CLOSE2 CLOSE3 CLOSE4 CLOSE +5 CLOSE6 CLOSE7 CURLY CURLYM CURLYN CURLYX DIGIT END EOL EOS EXACT EXACTF IFMATCH MBOL MEOL MINMOD NALNUM NDIGIT NOTHING NSPACE OPEN1 OPEN2 OPEN3 OPEN4 OPEN5 OPEN6 OPEN7 PLUS REG_ANY SANY SBOL SEOL SPACE STAR SUCCEED TAIL TRIE TRIEC UNLESSM WHILEM)}=(); sub any (&@) { my $predicate = shift @_; $predicate->() and return 1 for @_; return; } local $/; my $data = <>; my ( $bm_ct, $dfa_ct, $nfa_ct ); while ($data =~ /Final program:\n((?:\s[^\n]+\n)+)/mg) { my @ops = $1 =~ / ([A-Z]\w+)/g; if ( not any { not exists $sim{$_} } @ops ) { ++ $bm_ct; } elsif ( not any { not exists $dfa{$_} } @ops ) { ++ $dfa_ct; } else { ++ $nfa_ct; } } print "NFA $nfa_ct / DFA $dfa_ct / BM $bm_ct\n";
⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊
In reply to Re^2: Perl regexp matching is slow??
by diotalevi
in thread Perl regexp matching is slow??
by smahesh
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |