Hi, I've managed to write a regex which crashes perl and in diagnosing the issue found another which returns the wrong data in the capture groups. I wrote the code below to test out the regex and the returned capture groups.
#! /usr/bin/perl use warnings; use strict; my $String_to_test = $ARGV[0]; if ( !defined($String_to_test) || $String_to_test eq "") { die "please supply a string to test qr code agaist as first ar +guement\n"; } my $qrtest = eval ( $ARGV[1] ); if ( $@ ) { die "invalid qr supplied at arg2. caused the following error\n +$@\n"; } print "qr=$qrtest\n"; print "string=$String_to_test\n"; if ( $String_to_test =~ $qrtest ) { print "$qrtest present\n"; print "1=$1\n" if defined($1); print "2=$2\n" if defined($2); print "3=$3\n" if defined($3); print "4=$4\n" if defined($4); print "5=$5\n" if defined($5); print "6=$6\n" if defined($6); print "7=$7\n" if defined($7); print "8=$8\n" if defined($8); } else { print "$qrtest NOT present\n"; }
To break the problem down I started with the example from perlre for the branch reset. The qr_test.pl contains the code above.
Case 3 is the broken one. Case 1 and 2 are trying to show the fundamental structure of the regex is OK and Case 4 is a work round.

Case 1: Using example regex from perlre branch reset
./qr_test.pl atuvz 'qr/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) + ( z ) /x' qr=(?x-ism: ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) ) string=atuvz (?x-ism: ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) ) presen +t 1=a 2=t 3=v 4=z
This does what is expected.

Case 2: Added an additional term to 3rd alternative "(w)"
./qr_test.pl atuvwz 'qr/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) +(w) ) ( z ) /x' qr=(?x-ism: ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) (w) ) ( z ) ) string=atuvwz (?x-ism: ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) (w) ) ( z ) ) pr +esent 1=a 2=t 3=v 4=z
This does not do what is expected the 4th capture group should contain w not z and the 5th should contain z

Case 3: Added a fourth term to the 3rd alternative "(x)"
./qr_test.pl atuvwxz 'qr/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) + (w) (x) ) ( z ) /x' qr=(?x-ism: ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) (w) (x) ) ( z + ) ) string=atuvwxz (?x-ism: ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) (w) (x) ) ( z ) +) present 1=a 2=t ./qr_test.sh: line 11: 40158 Segmentation fault
This crashes perl

Case 4: Having see the odd behaviour in case 2 tried adding dummy groups to first alternative.
./qr_test.pl 'atuvwxz' 'qr/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u ( +v) (w) (x) ) ( z ) /x' + ./qr_test.pl atuvwxz 'qr/ ( a ) (?| x ( y ) z () () ()| (p (q) r) | + (t) u (v) (w) (x) ) ( z ) /x' qr=(?x-ism: ( a ) (?| x ( y ) z () () ()| (p (q) r) | (t) u (v) (w) ( +x) ) ( z ) ) string=atuvwxz (?x-ism: ( a ) (?| x ( y ) z () () ()| (p (q) r) | (t) u (v) (w) (x) +) ( z ) ) present 1=a 2=t 3=v 4=w 5=x 6=z
This works so I have a work around.

So my questions are these:
Is this a bug?
Or did I missing something in a manual that says the first alternation must have the most capture groups?
if I did missing something in a manual should perl have warned me rather than crashing? Version of perl I'm using is This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi Fix: Upgrade to a perl 5.022 or later.

In reply to regex with capture groups and branch reset crashes perl by Allinav

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.