in reply to Re: Backslashes in regular expressions
in thread Backslashes in regular expressions

But of course you could save a lot of time an just do:

my @split = split /(?<!\\)&/, $tosplit;

Ok, I can see that that will split on an ampersand without a backslash before it, how does it deal with the situation where you have a "backslashed" backslash at the end of one of the strings? i.e. the string to be split ends with two backslashes neither of which is intended to "backslash" the ampersand. I don't think your split works in all situations which is what I was attempting.

Actually I've had another look and come up with this:

/[^\\]\\(\\\\)*$/
Which seems to do what I wanted. I still don't know why
((\\)+\1)
doesn't work. Why is
((\\)+\1)* not equivalent to (\\\\)*
Oh yes I also tried
((\\)+\2)*
but that didn't seem to work either.

Am I missing somthing fundamental and blindingly obvious?

Thanks for taking the time to read my ramblings

Replies are listed 'Best First'.
RE: RE: Re: Backslashes in regular expressions
by perlmonkey (Hermit) on May 08, 2000 at 06:43 UTC
    ((\\)+\1)* is not equivalent to (\\\\)* because of the grouping (and the multiplier, but that part is obvious). I think, because you group the entire term ((\\)+\1)) then \1 is not refering to the (\\). I believe \1 would be undefined at that moment because you are actually inside the first pattern which would be ((\\)+\1)).

    I am sure you want to use the ?: operator which "is for clustering, not capturing" which is from the perlre perldoc. Using ?:, the outer group will not get reference to \1 or $1, so the (\\) will get referenced to \1. Then (?:(\\)+\1))* should be equivalent to (\\+\\)*

    I am sure this is all vague and confusing to most, but I hope it helped a little.
      Thanks, Yes I can see that the grouping affects the backreference. I had tried various combinations and it definitely helps to use the (?:) grouping construct.

      If anyone's interested, I finally got it figured out to my own satisfaction. The problem stems from the placement of the + multiplier, if you place it as I did originally (reduced to it's simplest form):

      /(\\)+\1/
      Then the original (\\) matches at least one \. This is then muliplied any number of times and followed by a single \ (since that's what the () stored). The effect of this is to match any number more than one. What you actually need is:
      /(\\+)\1/
      This matches at least one, but possibly more \, this group of \s are then stored as the backreference and the whole expression will only match if there are an even number of \s. :-)

      It all seems a lot simpler now that I've got it figured out, it just didn't seem that way when I was working through it.

      Thanks for Reading

      Nuance badly copying Simplicus