Re: regexp causes segfault
by tachyon (Chancellor) on Mar 06, 2003 at 00:46 UTC
|
# proof it behaves right, uncomment to see
# $_ = " '==\\'==' '==5==' '\\'\\'' '\\'3' '\\'' '1' '' " x 2;
$n = 40000000;
$_ = "'" . "=" x $n . "'";
my @pos;
while ( /(?<!\\)'/gc ) {
push @pos, pos;
}
for ( my $i= 0; $i <@pos; $i +=2 ) {
my $begin = $pos[$i];
my $end = $pos[$i+1]-1;
my $str = substr $_, $begin, ($end -$begin);
# check what we have found using test string commented out
#print "$begin $end '$str'\n";
print length($str), "\n";
}
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] [d/l] |
|
tachyon,
Should a beast of a regular expression like this be put into an EVAL block?
Just wondering.
Shirkdog
| [reply] |
|
That was me, forgot I was not logged in:-)
| [reply] |
|
I have no idea what you mean. All this regex does is walk the string (char by char) with a 1 char buffer (the last char). When it gets a match (last char not \\ and char eq ') it gets a match and we record the position. This is hardly a regex at all!
Here it is completely C-ified - no regexes in sight. Possibly faster than the original post to boot but I can't be bothered to test.
$str = " '==\\'==' '==5==' '\\'\\'' '\\'3' '\\'' '1' '' ";
my $pos = 0;
my $len = length $str;
my $last = '';
my $char;
while ( $pos < $len ) {
$char = substr $str, $pos, 1;
push @pos, $pos if $char eq "'" and $last ne "\\";
$pos++;
$last = $char;
}
for ( my $i= 0; $i <@pos; $i +=2 ) {
my $begin = $pos[$i]+1;
my $end = $pos[$i+1];
my $str = substr $str, $begin, ($end -$begin);
print "$begin $end |$str|\n";
}
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] [d/l] |
Re: regexp causes segfault
by hv (Prior) on Mar 05, 2003 at 21:37 UTC
|
This will do it, but may be slow when failing:
/'(?:[^\\']*(?:\\.|'' )?)*'/
I think the slow failure mode can be overcome with a cut operator: /'(?:(?>[^\\']*)(?:\\.|'' )?)*'/
Hugo | [reply] [d/l] [select] |
|
I should clarify that this won't match exactly the same strings as code in the original post, since it always treats a backslash as escaping the following character, so that "'\\'" would be treated as a valid quoted string of a single escaped backslash, whereas the original code would ignore the first backslash and then treat the second backslash as escaping the quote. (And then fail, and backtrack, and do the right thing anyway: "'\\''" would probably be a better example.)
Hugo
| [reply] [d/l] [select] |
Re: regexp causes segfault
by pg (Canon) on Mar 06, 2003 at 02:31 UTC
|
Just to add one point.
When I tested your regexp with AS5.8.0, it didn't core dump, I guess you used some old version.
It didn't work with AS5.8.0, however it was more robust, and gave a differet msg saying "Complex regular subexpression recursion limit (32766) exceeded".
This makes sense, as they need a way to:
- avoid dead loop (better call it dead recursion)
- avoid memory allocateion problem
| [reply] |
|
This varies depending on platform: perl's configuration script tries to determine the right limit, but doesn't always get it right, and when the limit is too high you'll get the coredump when the real limit is hit.
It is currently the intention to remove this limitation altogether for perl-5.10.0, by rewiring the regular expression engine to use a new internal stack (which can be grown as needed) rather than the system stack, but it isn't clear yet whether we can do that without slowing down the engine.
Hugo
| [reply] |
Re: regexp causes segfault
by shirkdog_perl (Beadle) on Mar 06, 2003 at 05:44 UTC
|
That takes care of it Tach
Cheers | [reply] |
Re: regexp causes segfault
by Weathros (Novice) on Mar 06, 2003 at 09:18 UTC
|
See the escaping now...
Regex probably isn't the best way to solve 4MB matches ;) If your'e not careful your computer can spend a looong trying to match ;) | [reply] |
Re: regexp causes segfault
by bart (Canon) on Mar 06, 2003 at 12:16 UTC
|
Try non-capturing parentheses.
/'(?:\\'|''|[^'])+'/
It should offer some improvement.
Update: Well, if it doesn't, it won't be much. I still get the same error messages as pg, on Windows. (Indigoperl 5.6.1, so it's not just a 5.8.0 thing.) | [reply] [d/l] |