in reply to regexp causes segfault

Do it C style using pos and substr. This is laser fast and tested to 40MB

# proof it behaves right, uncomment to see # $_ = " '==\\'==' '==5==' '\\'\\'' '\\'3' '\\'' '1' '' " x 2; $n = 40000000; $_ = "'" . "=" x $n . "'"; my @pos; while ( /(?<!\\)'/gc ) { push @pos, pos; } for ( my $i= 0; $i <@pos; $i +=2 ) { my $begin = $pos[$i]; my $end = $pos[$i+1]-1; my $str = substr $_, $begin, ($end -$begin); # check what we have found using test string commented out #print "$begin $end '$str'\n"; print length($str), "\n"; }

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Replies are listed 'Best First'.
Re: Re: regexp causes segfault
by Anonymous Monk on Mar 06, 2003 at 01:29 UTC
    tachyon,
    Should a beast of a regular expression like this be put into an EVAL block?
    Just wondering.


    Shirkdog
Re: Re: regexp causes segfault
by shirkdog_perl (Beadle) on Mar 06, 2003 at 01:30 UTC
    That was me, forgot I was not logged in:-)

      I have no idea what you mean. All this regex does is walk the string (char by char) with a 1 char buffer (the last char). When it gets a match (last char not \\ and char eq ') it gets a match and we record the position. This is hardly a regex at all!

      Here it is completely C-ified - no regexes in sight. Possibly faster than the original post to boot but I can't be bothered to test.

      $str = " '==\\'==' '==5==' '\\'\\'' '\\'3' '\\'' '1' '' "; my $pos = 0; my $len = length $str; my $last = ''; my $char; while ( $pos < $len ) { $char = substr $str, $pos, 1; push @pos, $pos if $char eq "'" and $last ne "\\"; $pos++; $last = $char; } for ( my $i= 0; $i <@pos; $i +=2 ) { my $begin = $pos[$i]+1; my $end = $pos[$i+1]; my $str = substr $str, $begin, ($end -$begin); print "$begin $end |$str|\n"; }

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print