Re: Splitting on escapable delimiter

I’d try reversing the string, split it, then reverse all the pieces. That way you can use a variable-width look-ahead assertion instead of a(n unsupported) variable width look-behind assertion.

$_ = "#@##@###@####@#####@";
$_ = reverse;
my @pieces = reverse (split /\@(?=(?:##)*(?!#))/);                    
+          
for (@pieces) {
    $_ = reverse;
}
print "@pieces\n";
[download]

The regex is a little hairy; it has a negative look-ahead assertion inside the positive look-ahead assertion.

Comment on Re: Splitting on escapable delimiter Download Code

Replies are listed 'Best First'.
Re^2: Splitting on escapable delimiter by mobiusinversion (Beadle) on Mar 28, 2008 at 22:25 UTC
I have to say, that is clever! At first I smacked my forehead that after 2 years of daily Perl programming I had never thought "Duh! Variable width lookbehind is just variable width lookahead on the reverse string!". Bravo! Unfortunately, this solution does not recover empty fields delimited in this way... For example, try the example string above with two '@''s appended to the beginning (as you would find after having delimited empty fields). See my post below for the correct way to handle this using loop-unrolling (in one regex and no lookaround!).	[reply]
Re^3: Splitting on escapable delimiter by Anonymous Monk on Mar 28, 2008 at 22:57 UTC
reads documentation for split Ah, I need to add a -1 as a third parameter to split. Good spot.	[reply]
Re^4: Splitting on escapable delimiter by mobiusinversion (Beadle) on Mar 28, 2008 at 23:40 UTC
Wow, I totally should have seen that! Okay so now that mutual correctness has been established, it is time for optimality checking. It turns out that your method is about 10% faster. I called my method unroll, and your method rollahead. Here are the benchtests: Benchmark: timing 100000 iterations of rollahead, unroll... rollahead: 18.3956 wallclock secs (17.94 usr + 0.00 sys = 17.94 CPU) @ 5575.07/s (n=100000) unroll: 22.1357 wallclock secs (20.58 usr + 0.00 sys = 20.58 CPU) @ 4859.56/s (n=100000) Rate unroll rollahead unroll 4860/s -- -13% rollahead 5575/s 15% -- and the code: use strict; use Benchmark ':all', ':hireswallclock'; my $x = "#@##@###@####@#####@"; my $y = reverse $x; my $z = "$x$x$x$y$y$x$x$y$y$y$y$y$x$x$x$x"; my $r = timethese( 100000, { unroll => sub { my @x = ([unroll($x)],[unroll($y)],[unroll($z)]) }, rollahead => sub { my @x = ([rollahead($x)],[rollahead($y)],[rollahead($z)]) }, } ); cmpthese($r); sub unroll { my @x = $_[0] =~ /(?:^\|@)((?:##\|#@\|[^#@]))/g; for(@x){ $_ =~ s/##/#/g; $_ =~ s/#@/@/g; } @x } sub rollahead { my $x = shift; $x = reverse $x; my @x = reverse(split/\@(?=(?:##)(?!#))/,$x,-1); for(@x){ $_ = reverse; $_ =~ s/##/#/g; $_ =~ s/#@/@/g; } @x } [download] Is there a monk who could explain why rollahead is faster? I was surprised considering the number of calls to reverse. I can only guess that somewhere deep inside the guts of the Perl-Regex-Beasty, that the optimizer droids are nasty hardcore with lookaround automata but can't be bothered with alternation.	[reply] [d/l]
Re^5: Splitting on escapable delimiter by Anonymous Monk on Mar 29, 2008 at 01:30 UTC
Re^6: Splitting on escapable delimiter by mobiusinversion (Beadle) on Mar 29, 2008 at 07:14 UTC