I wanted to split a string "a=1&b=2\&3&c=4" on '&' _unless_ that & is escaped. No one seemed able to help, not the litterature, not usenet, nothing.

But finally I got it. I really hope that it will save someone just a small part of the time it took me to figure.

If anyone can do it one line I would like to know, this is a kind of hack, though a good one.

The regexp says: split on & preceded by any even number of backslashes, which is not preceded by a síngle backslash. The idea is to find a uneven number or backlashes. Unfortunately the implementation of split is less than perfect which is why the second line is needed. It cleans up the extra fields that get produced.

think perl for perl is the most elevated

/dh

@a = split /(?<!\\)(\\\\)*&/, "a=1&b=2\\&3&c=4"; $a[$_] .= splice @a, $_ + 1, 1 for (0..($#a/2));

Replies are listed 'Best First'.
RE: split unless escaped
by Anonymous Monk on Apr 25, 2000 at 20:02 UTC
    if you use (?: ) (do not make backreference) you can go with out the second line. @a = split /(?<!\\)(?:\\\\)*&/, "a=1&b=2\\&3&c=4"; print join("|",@a)."\n"; @b = split /(?<!\\)(\\\\)*&/, "a=1&b=2\\&3&c=4"; $b$_ .= splice @b, $_ + 1, 1 for (0..($#b/2)); print join("|",@b)."\n";
RE: split unless escaped
by Anonymous Monk on Apr 25, 2000 at 20:10 UTC
    if you use (?: ) (do not make backreference) you can go with out the second line.
    @a = split /(?<!\\)(?:\\\\)*&/, "a=1&b=2\\&3&c=4"; print join("|",@a)."\n"; @b = split /(?<!\\)(\\\\)*&/, "a=1&b=2\\&3&c=4"; $b[$_] .= splice @b, $_ + 1, 1 for (0..($#b/2)); print join("|",@b)."\n";
      The problem is that then you lose any trailing double-backslashes, which are correctly kept in the OP method. For example, try it with the string 'a=1&b=2\&3\\&c=4'.
RE: split unless escaped
by nuance (Hermit) on May 08, 2000 at 03:35 UTC
    I've looked at your solution, it works for a single pair of backslashes at the end of a line, but it doesn't work in the most general case i.e. any number of backslashes at the end of one of the strings to be split. An odd number of backslashes means that the ampersand is backslashed, this is accounted for. However when there are an even number of backslashes they will be reduced to a single pair.

    As far as I can see the following works in all cases

    #!/usr/bin/perl -w use strict; my @split; my $var; # the original string that I want to split my $tosplit = q(a=1&b=2\&3\&c=4\\\\\\\\\\\&d=5); # print out the string to confirm how many backslashes have been left # by the quote statement print $tosplit . "\n\n"; # Split the string on the ampersand my @temp = split /&/, $tosplit; # Rejoin any strings that should not have been split while ($_ = shift @temp) { $_ .= ("&" . shift @temp) and redo if /[^\\]\\(\\\\)*$/; push @split, $_; }; # print the array so we can see the results foreach $var (@split) {print "$var" . "\n"};
    Baldrick, you wouldn't see a subtle plan if it painted itself purple and danced naked on top of a harpsichord, singing "Subtle plans are here again!"
RE: split unless escaped
by anders (Initiate) on May 26, 2000 at 19:57 UTC
    Hmm, besides the clever ?:, you can also avoid the splice:
    perl -e '$_ and push @v, $_ for (split /(?<!\\)(\\\\)*&/, "a=1&b=2\\&3 +&c=4"); print join "\n", @v'
    -anders