Re^3: how to speed up that dynamic regex?

It seems that using (?PARNO) in my example speeds up coping against long random line of parentheses at least 10 times on average.
But still there are some frequent random data, that regex works slow. I changed the length of random line to 50000, and then sometimes regex cope in 0.1 sec, and sometimes it takes >5 sec.
I only changed regex a little bit:

my $regex =
qr{
                           # start of full pattern
        \(                  # match an opening parenthesis
            (?:
                0        # everytime false, because of such DATA
                |           # OR
                (?0)        # recurse to capture full pattern
            )*
        \)                  # match a closing parenthesis
                           # end of  full pattern
}x;
[download]

One more question: how to find consecutive balanced parentheses (and their max length with frequency of them)? I used "+" quantifier in: m/$regex+/g (/$m+/g). Now if I use this quantifier, program works with really bad asymptotics, but if I use possesive m/$regex++/g, it run faster but still slow. Why?

Comment on Re^3: how to speed up that dynamic regex? Download Code

Replies are listed 'Best First'.
Re^4: how to speed up that dynamic regex? by rsFalse (Chaplain) on Nov 07, 2014 at 11:28 UTC
Just solved that way: `my @groups = (); my $pred = -1; my $j = 0; while ($string =~ m/$regex/g){ $pred == $-[0] or $j++; $groups[$j] += $+[0] - $-[0]; $pred = $+[0]; }` [download] And reduced regex to: `qr{ # start of full pattern \( # match an opening parenthesis (?0)* # recurse to capture full pattern \) # match a closing parenthesis # end of full pattern }x;` [download] UPDATE: seems that using recursive regex is dangerous, because in rare random "unlucky cases" it takes very much time, although in often "lucky cases" it takes little amount of time. I found seemingly better and faster (or say more stable) way - just glue long regex and use+modify through iterations against the target string. Code: #use strict; use warnings; use Time::HiRes; $\|++; srand; my @lines = <>; my $n = 50000; my $long = ''; $long .= qw{( )}[ int(rand() * 2) ] for 1 .. $n; push @lines, $long; my @lines2 = @lines; my $start = Time::HiRes::time(); my $depth = 10; # some depth parameter my $pre_regex2 = '(?:\((?:\d+,)(?:\)\|' x $depth . '(?!)' . '\)))+' x $depth; # if $depth == 1 then looks: '(?:\((?:\d+,)(?:\)\|x\)))+'; # (?: # \( # (?:\d+,)* (?: # \) # \| # (?!)\) # ) # )+ my $regex2 = qr/$pre_regex2/; my $regex3 = qr/(?:\d+,){2,}/; for my $string (@lines2){ while ( $string =~ s/$regex2/ $a=$&, $b=()=$a=~m{[)(]}g, $a=~s{\d+,}{$b+=$&,''}eg, "$b," /eg or $string =~ s/$regex3/ $a=$&, $b=0, $a=~s{\d+,}{$b+=$&,''}eg, "$b," /eg ) {} my @groups = $string =~ m/\d+/g; my $max=0; for my $group (@groups){ $max < $group and $max = $group } my $occ = 0; for my $group (@groups){ $occ += $max == $group } $max or $occ=1; print "$max $occ\n"; } print Time::HiRes::time() - $start, $/; $start = Time::HiRes::time(); my $regex = qr{ \( # match an opening parenthesis (?0)* # recurse to capture full pattern \) # match a closing parenthesis }x; for my $string (@lines) { my $max = 0; my $occ = 1; my @groups = (); my $pred = -1; my $i = 0; while ($string =~ m/$regex/g){ $pred == $-[0] or $i++; $groups[$i] += $+[0] - $-[0]; $pred = $+[0]; } if (@groups) { @groups = sort { $a <=> $b } @groups; $max = $groups[-1]; for my $i (reverse 0 .. $#groups - 1) { if ( $groups[$i] == $groups[-1]) { ++$occ; } else { last; } } } print "$max $occ\n"; } print Time::HiRes::time() - $start, $/; [download] ( http://ideone.com/K12225 ) Usually first block takes ~0,5 sec, when second block (recursive regex) takes from 0.01 sec to more than 5 sec.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: how to speed up that dynamic regex?
by rsFalse (Chaplain) on Nov 07, 2014 at 11:28 UTC

my @groups = ();
my $pred = -1;
my $j = 0;
while ($string =~ m/$regex/g){
        $pred == $-[0] or $j++;
        $groups[$j] += $+[0] - $-[0];
        $pred = $+[0];
        }
[download]

qr{
                           # start of full pattern
        \(                  # match an opening parenthesis
                (?0)*        # recurse to capture full pattern
        \)                  # match a closing parenthesis
                           # end of  full pattern
}x;
[download]

UPDATE

#use strict;
use warnings;
use Time::HiRes;
$|++;

srand;
my @lines = <>;
my $n     = 50000;
my $long  = '';
   $long .= qw{( )}[ int(rand() * 2) ] for 1 .. $n;
   
push @lines, $long;

my @lines2 = @lines;

my $start = Time::HiRes::time();

my $depth = 10; # some depth parameter
my $pre_regex2 = 
'(?:\((?:\d+,)*(?:\)|' x $depth . '(?!)' . '\)))+' x $depth;
# if $depth == 1 then looks: '(?:\((?:\d+,)*(?:\)|x\)))+';
#    (?:
#        \(
#            (?:\d+,)*    (?:
#                            \)
#                            |
#                            (?!)\)
#                        )
#    )+
my $regex2 = qr/$pre_regex2/;
my $regex3 = qr/(?:\d+,){2,}/;

for my $string (@lines2){
    
    while ( 
        $string =~ s/$regex2/
            $a=$&,
            $b=()=$a=~m{[)(]}g,
            $a=~s{\d+,}{$b+=$&,''}eg,
            "$b,"
            /eg 
        or
        $string =~ s/$regex3/
            $a=$&,
            $b=0,
            $a=~s{\d+,}{$b+=$&,''}eg,
            "$b,"
            /eg 
        )
        {}
    my @groups = $string =~ m/\d+/g;
    my $max=0;
    for my $group (@groups){
        $max < $group and $max = $group
        }
    my $occ = 0;
    for my $group (@groups){
        $occ += $max == $group
        }
    $max or $occ=1;
    print "$max $occ\n";
    }
    
print Time::HiRes::time() - $start, $/;

$start = Time::HiRes::time();

my $regex =
qr{
    \(                  # match an opening parenthesis
         (?0)*          # recurse to capture full pattern
    \)                  # match a closing parenthesis
}x;

for my $string (@lines)
{
    my $max    = 0;
    my $occ    = 1;
     my @groups = ();
    my $pred = -1;
    my $i = 0;
    while ($string =~ m/$regex/g){
        $pred == $-[0] or $i++;
        $groups[$i] += $+[0] - $-[0];
        $pred = $+[0];
        }

    if (@groups)
    {
        @groups = sort {    $a <=>    $b } @groups;
        $max =    $groups[-1];

        for my $i (reverse 0 .. $#groups - 1)
        {
            if (    $groups[$i] ==     $groups[-1])
            {
                ++$occ;
            }
            else
            {
                last;
            }
        }
    }

    print "$max $occ\n";
}

print Time::HiRes::time() - $start, $/;
[download]

http://ideone.com/K12225

[reply]
[d/l]
[select]