nuance has asked for the wisdom of the Perl Monks concerning the following question:

I was browsing through the snippets section and found a node called split unless escaped. This intrigued me and I tried to work out a solution wihout looking at the one given.

I have now got stuck with the a regulart expression, it does not do what I expected and i can't figure out why. The expression is

/[^\\]\\((\\)+\1)*$/
I thought it should match anything that is not a backslash, followed by a backslash, followed by zero or an even number of backslashes all at the end of a string. It doesn't seem to work, I've included the script I had it as a part of, maybe someone can tell me why it doesn't work.

Thanks

#!/usr/bin/perl -w use strict; my @split; my $var; # the original string that I want to split my $tosplit = q(a=1&b=2\&3&c=4\\\\\&d=5); # print out the string to confirm how many backslashes have been left # by the quote statement print $tosplit . "\n\n"; # Split the string on the ampersand my @temp = split /&/, $tosplit; # I thought this should have joined any two strings that were # previously separated by an odd number of backslashes at the # end of a string. while ($_ = shift @temp) { $_ .= ("&" . shift @temp) and redo if /[^\\]\\((\\)+\1)*$/; push @split, $_; }; foreach $var (@split) {print "$var" . "\n"};

Replies are listed 'Best First'.
Re: Backslashes in regular expressions
by perlmonkey (Hermit) on May 08, 2000 at 00:19 UTC
    So for output you were looking for this?
    a=1 b=2\&3 c=4\\\&d=5
    If that is what you are looking for the problem was with your grouping '()' and the \1. The \1 was not matching the (\\) but it was trying to match the *first* grouping which was ((\\)+\1) and I dont think is what you want.
    So change the regex to /[^\\]\\(?:(\\)+\1)*$/ But of course you could save a lot of time an just do: my @split = split /(?<!\\)&/, $tosplit; Using the negative lookbehind assertion which is talked about here.
      But of course you could save a lot of time an just do:

      my @split = split /(?<!\\)&/, $tosplit;

      Ok, I can see that that will split on an ampersand without a backslash before it, how does it deal with the situation where you have a "backslashed" backslash at the end of one of the strings? i.e. the string to be split ends with two backslashes neither of which is intended to "backslash" the ampersand. I don't think your split works in all situations which is what I was attempting.

      Actually I've had another look and come up with this:

      /[^\\]\\(\\\\)*$/
      Which seems to do what I wanted. I still don't know why
      ((\\)+\1)
      doesn't work. Why is
      ((\\)+\1)* not equivalent to (\\\\)*
      Oh yes I also tried
      ((\\)+\2)*
      but that didn't seem to work either.

      Am I missing somthing fundamental and blindingly obvious?

      Thanks for taking the time to read my ramblings

        ((\\)+\1)* is not equivalent to (\\\\)* because of the grouping (and the multiplier, but that part is obvious). I think, because you group the entire term ((\\)+\1)) then \1 is not refering to the (\\). I believe \1 would be undefined at that moment because you are actually inside the first pattern which would be ((\\)+\1)).

        I am sure you want to use the ?: operator which "is for clustering, not capturing" which is from the perlre perldoc. Using ?:, the outer group will not get reference to \1 or $1, so the (\\) will get referenced to \1. Then (?:(\\)+\1))* should be equivalent to (\\+\\)*

        I am sure this is all vague and confusing to most, but I hope it helped a little.