in reply to Curios use of regular expressions in split

As you can see, the spaces in the string are preserved when using 'abc' as the separator. This may or may not be the behaviour you want.

You could actually target two arrays, one for words and one for separators, by combining the split with a push. If you don't actually want the trailing and leading spaces preserved, you could be to split on white space and direct to a separate arrays as before.

knoppix@Microknoppix:~$ perl -E ' > $str = q{one abc two abc three abc four}; > $sep = q{abc}; > say q{-} x 25; > > @arr = split m{($sep)}, $str; > say qq{Split on m{($sep)} into one array}; > say qq{ ->$_<-} for @arr; > say q{-} x 25; > > push @{ $_ eq $sep ? \ @seps : \ @nums }, $_ > for split m{($sep)}, $str; > say qq{Split on m{($sep)} into two arrays}; > say q{ Nums:}; > say qq{ ->$_<-} for @nums; > say q{ Seps:}; > say qq{ ->$_<-} for @seps; > say q{-} x 25; > > @seps = (); @nums = (); > push @{ $_ eq $sep ? \ @seps : \ @nums }, $_ > for split m{\s+}, $str; > say qq{Split on m{\\s+} into two arrays}; > say q{ Nums:}; > say qq{ ->$_<-} for @nums; > say q{ Seps:}; > say qq{ ->$_<-} for @seps; > say q{-} x 25;' ------------------------- Split on m{(abc)} into one array ->one <- ->abc<- -> two <- ->abc<- -> three <- ->abc<- -> four<- ------------------------- Split on m{(abc)} into two arrays Nums: ->one <- -> two <- -> three <- -> four<- Seps: ->abc<- ->abc<- ->abc<- ------------------------- Split on m{\s+} into two arrays Nums: ->one<- ->two<- ->three<- ->four<- Seps: ->abc<- ->abc<- ->abc<- ------------------------- knoppix@Microknoppix:~$

I hope this is of interest.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^2: Curios use of regular expressions in split
by juliosergio (Sexton) on Feb 16, 2012 at 00:57 UTC

    Great! Finally I'm understanding what's behind the split function!

    When you enclose the expression with parenthesis, the string is splitted according to the pattern but the matched separators are also stored in the resulting array. That's really interesting, because you can easily manipulate in the same array both, the separated stuff and the separators. See my example below:

    #! /usr/bin/perl use Data::Dumper; $re = "(ab+)"; $input = "uno abb dos ab tres abbb cuatro"; my @splits = split /$re/, $input; print "splits: ", Dumper \@splits; __END__ splits: $VAR1 = [ 'uno ', 'abb', ' dos ', 'ab', ' tres ', 'abbb', ' cuatro' ];