negative lookbehind and VERY strange capture

Denis.Beurive has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: negative lookbehind and VERY strange capture by Corion (Patriarch) on Sep 18, 2016 at 11:17 UTC
The `$2` is filled repeatedly by `((?<=\\)"\|[^"])+`, and the last thing it matched was the `o` at the end of `toto`. Also, it looks as if you are trying to parse quoted constructs. Have you considered what should happen for the following strings: `"Toto\"ro" "Toto\\Africa" "Toto\\"` [download] Personally, I prefer the following approach for quoted constructs with backslash escaping instead of dealing lookbehind: `^"((?:[^"\]+\|\\["\\]))"$` [download] that is, "anything that is not a quote or a backslash", or "a backslash, followed by another backslash, or a quote"	[reply] [d/l] [select]
Re^2: negative lookbehind and VERY strange capture by Denis.Beurive (Initiate) on Sep 18, 2016 at 12:15 UTC
Hello Corion Thank you very much for your suggestion! It helps a lot! `my @tests = ( '"abcd\\\\efgh"', '"abcd\\""', '"abcd\\"efgh"', '"abcd\\\\\\"efgh"', '"abcd\\\\i\\"efgh"', '"abcd\\\\"', ); foreach my $test (@tests) { print "Try for \"$test\":\n"; if ($test =~ /^"((?:[^"\\]\|\\["\\])+)"$/) { print "It matches!\n"; print '$1: ' . $1 . "\n"; print '$2: ' . (defined($2) ? "\$2 is defined\n" : "\$2 is NOT def +ined\n"); } else { print "It does not match!\n"; } print "\n"; }` [download] Result: `Try for ""abcd\\efgh"": It matches! $1: abcd\\efgh $2: $2 is NOT defined Try for ""abcd\""": It matches! $1: abcd\" $2: $2 is NOT defined Try for ""abcd\"efgh"": It matches! $1: abcd\"efgh $2: $2 is NOT defined Try for ""abcd\\\"efgh"": It matches! $1: abcd\\\"efgh $2: $2 is NOT defined Try for ""abcd\\i\"efgh"": It matches! $1: abcd\\i\"efgh $2: $2 is NOT defined Try for ""abcd\\"": It matches! $1: abcd\\ $2: $2 is NOT defined` [download] Best regards Denis	[reply] [d/l] [select]
Re^3: negative lookbehind and VERY strange capture by hippo (Archbishop) on Sep 18, 2016 at 16:12 UTC
Glad to hear you have a working solution. Might I also suggest that for test scripts like this you consider using one of the Test::* frameworks? They will help to highlight where your matches fail. Here's an example using the ubiquitous Test::More to show how simple it would be to integrate. #!/usr/bin/perl use strict; use warnings; use Test::More; # Set up source strings (keys) and expected results (values) my %tests = ( '"abcd\\\\efgh"' => 'abcd\\\\efgh', '"abcd\\""' => 'abcd\\"', '"abcd\\"efgh"' => 'abcd\\"efgh', '"abcd\\\\\\"efgh"' => 'abcd\\\\\\"efgh', '"abcd\\\\i\\"efgh"' => 'abcd\\\\i\\"efgh', '"abcd\\\\"' => 'abcd\\\\' ); # Set the total number of tests to perform plan tests => 3 * keys %tests; while ( my ($test, $exp) = each %tests) { ok ($test =~ /^"((?:[^"\\]\|\\["\\])+)"$/, "$test matches"); is ($1, $exp, "\$1 is $exp"); is ($2, undef, '$2 is undefined'); } [download] If any of the tests fail it is easier to spot than having to visually parse the script output. You can also run `prove` on the script to get just a summary which is even clearer. If you write a lot of scripts like this, it is well worth becoming familiar with the wealth of testing modules available.	[reply] [d/l] [select]
Re^3: negative lookbehind and VERY strange capture by AnomalousMonk (Archbishop) on Sep 18, 2016 at 18:31 UTC
FWIW, please note that `/^"((?:[^"\\]\|\\["\\])+)"$/` does not match the empty string: `c:\@Work\Perl\monks>perl -wMstrict -le "my $emptystring = '\"\"'; print '+ match' if $emptystring =~ /^\"((?:[^^\"\\]\|\\[\"\\])+)\"$/; print '* match' if $emptystring =~ /^\"((?:[^^\"\\]\|\\[\"\\]))\"$/; " match` [download] (Please forgive all the extra backslashes and ignore the extra `^` in `[^^\"\\]`. These are Windoze command line artifacts.) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: negative lookbehind and VERY strange capture by BrowserUk (Patriarch) on Sep 18, 2016 at 11:22 UTC
Expanding out your regex it becomes obvious `use strict; use warnings; my $s = '"toto"'; if ($s =~ m[^ (?<!\\)" ( ( (?<=\\)"\|[^"] )+ # $2 ) # $1 (?<!\\)" $]x ) { print "It matches!\n"; print $1 . "\n"; print $2 . "\n"; }` [download] You have two nested pairs of capturing parens, the inner pair with a repeat. So the outer pair captures everything matched by the inner repetition, and the inner pair captures the last thing it matches. Hence "toto" and the last character of that "o". With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: negative lookbehind and VERY strange capture by Denis.Beurive (Initiate) on Sep 18, 2016 at 11:53 UTC
Hello BrowserUK, Thank you very much for your answer. I just learn something. I put « non-capturing parentheses » and it did the trick : `my $s = '"abcd"'; print "Try for \"$s\":\n"; if ($s =~ /^(?<!\\)"((?:(?<=\\)"\|[^"])*)(?<!\\)"$/) { print "It matches!\n"; print '$1: ' . $1 . "\n"; print '$2: ' . (defined($2) ? "\$2 is defined\n" : "\$2 is NOT defin +ed\n"); } else { print "It does not match!\n"; }` [download] However, as Corion pointed out, my regular expression does not work for this string « `abcd\\` ». I’ll try some new approaches. Best regards Denis	[reply] [d/l] [select]
Re: negative lookbehind and VERY strange capture by BillKSmith (Monsignor) on Sep 18, 2016 at 12:26 UTC
Consider using a module. Regexp::Common::balanced Bill	[reply]
Re: negative lookbehind and VERY strange capture by Denis.Beurive (Initiate) on Sep 18, 2016 at 11:31 UTC
Thank you very much for your answers. I’ll take some time to fully understand. Best Regards, Denis	[reply]