in reply to negative lookbehind and VERY strange capture

The $2 is filled repeatedly by ((?<=\\)"|[^"])+, and the last thing it matched was the o at the end of toto.

Also, it looks as if you are trying to parse quoted constructs. Have you considered what should happen for the following strings:

"Toto\"ro" "Toto\\Africa" "Toto\\"

Personally, I prefer the following approach for quoted constructs with backslash escaping instead of dealing lookbehind:

^"((?:[^"\]+|\\["\\]))"$

that is, "anything that is not a quote or a backslash", or "a backslash, followed by another backslash, or a quote"

Replies are listed 'Best First'.
Re^2: negative lookbehind and VERY strange capture
by Denis.Beurive (Initiate) on Sep 18, 2016 at 12:15 UTC

    Hello Corion

    Thank you very much for your suggestion! It helps a lot!

    my @tests = ( '"abcd\\\\efgh"', '"abcd\\""', '"abcd\\"efgh"', '"abcd\\\\\\"efgh"', '"abcd\\\\i\\"efgh"', '"abcd\\\\"', ); foreach my $test (@tests) { print "Try for \"$test\":\n"; if ($test =~ /^"((?:[^"\\]|\\["\\])+)"$/) { print "It matches!\n"; print '$1: ' . $1 . "\n"; print '$2: ' . (defined($2) ? "\$2 is defined\n" : "\$2 is NOT def +ined\n"); } else { print "It does not match!\n"; } print "\n"; }

    Result:

    Try for ""abcd\\efgh"": It matches! $1: abcd\\efgh $2: $2 is NOT defined Try for ""abcd\""": It matches! $1: abcd\" $2: $2 is NOT defined Try for ""abcd\"efgh"": It matches! $1: abcd\"efgh $2: $2 is NOT defined Try for ""abcd\\\"efgh"": It matches! $1: abcd\\\"efgh $2: $2 is NOT defined Try for ""abcd\\i\"efgh"": It matches! $1: abcd\\i\"efgh $2: $2 is NOT defined Try for ""abcd\\"": It matches! $1: abcd\\ $2: $2 is NOT defined

    Best regards

    Denis

      Glad to hear you have a working solution. Might I also suggest that for test scripts like this you consider using one of the Test::* frameworks? They will help to highlight where your matches fail. Here's an example using the ubiquitous Test::More to show how simple it would be to integrate.

      #!/usr/bin/perl use strict; use warnings; use Test::More; # Set up source strings (keys) and expected results (values) my %tests = ( '"abcd\\\\efgh"' => 'abcd\\\\efgh', '"abcd\\""' => 'abcd\\"', '"abcd\\"efgh"' => 'abcd\\"efgh', '"abcd\\\\\\"efgh"' => 'abcd\\\\\\"efgh', '"abcd\\\\i\\"efgh"' => 'abcd\\\\i\\"efgh', '"abcd\\\\"' => 'abcd\\\\' ); # Set the total number of tests to perform plan tests => 3 * keys %tests; while ( my ($test, $exp) = each %tests) { ok ($test =~ /^"((?:[^"\\]|\\["\\])+)"$/, "$test matches"); is ($1, $exp, "\$1 is $exp"); is ($2, undef, '$2 is undefined'); }

      If any of the tests fail it is easier to spot than having to visually parse the script output. You can also run prove on the script to get just a summary which is even clearer.

      If you write a lot of scripts like this, it is well worth becoming familiar with the wealth of testing modules available.

      FWIW, please note that  /^"((?:[^"\\]|\\["\\])+)"$/ does not match the empty string:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $emptystring = '\"\"'; print '+ match' if $emptystring =~ /^\"((?:[^^\"\\]|\\[\"\\])+)\"$/; print '* match' if $emptystring =~ /^\"((?:[^^\"\\]|\\[\"\\])*)\"$/; " * match
      (Please forgive all the extra backslashes and ignore the extra  ^ in [^^\"\\]. These are Windoze command line artifacts.)


      Give a man a fish:  <%-{-{-{-<