in reply to Re: Perl regexp matching is slow??
in thread Perl regexp matching is slow??
Here's the program I wrote to characterize regular expressions. I dumped the output of -Mre=debug after compiling everything in perl and ran this program on that output. It turns out that it decides only three regexps *require* an NFA, 750 require at least a DFA, and 325 could be done with just a plain Boyer-Moore search.
Generate your data:
find src/bleadperl -name '*.pm' -o -name '*.pl' -print0 | xargs -n1 -0 + perl -Mre=debug > some-file.txt perl this-program some-file.txt
The program:
use strict; use warnings; my %sim; @sim{qw(BOL EXACT END NOTHING SBOL EOL SEOL)} = (); my %dfa; @dfa{qw(ALNUM ANYOF BOL BRANCH BOUND CLOSE1 CLOSE2 CLOSE3 CLOSE4 CLOSE +5 CLOSE6 CLOSE7 CURLY CURLYM CURLYN CURLYX DIGIT END EOL EOS EXACT EXACTF IFMATCH MBOL MEOL MINMOD NALNUM NDIGIT NOTHING NSPACE OPEN1 OPEN2 OPEN3 OPEN4 OPEN5 OPEN6 OPEN7 PLUS REG_ANY SANY SBOL SEOL SPACE STAR SUCCEED TAIL TRIE TRIEC UNLESSM WHILEM)}=(); sub any (&@) { my $predicate = shift @_; $predicate->() and return 1 for @_; return; } local $/; my $data = <>; my ( $bm_ct, $dfa_ct, $nfa_ct ); while ($data =~ /Final program:\n((?:\s[^\n]+\n)+)/mg) { my @ops = $1 =~ / ([A-Z]\w+)/g; if ( not any { not exists $sim{$_} } @ops ) { ++ $bm_ct; } elsif ( not any { not exists $dfa{$_} } @ops ) { ++ $dfa_ct; } else { ++ $nfa_ct; } } print "NFA $nfa_ct / DFA $dfa_ct / BM $bm_ct\n";
⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Perl regexp matching is slow??
by demerphq (Chancellor) on Feb 27, 2007 at 22:02 UTC | |
by diotalevi (Canon) on Mar 01, 2007 at 02:28 UTC |