Re: Re: expanding the functionality of split
by tigervamp (Friar) on Dec 10, 2002 at 01:08 UTC
|
Although regular expressions could work, using split can be much easier and cleaner in certain circumstances if it had this functionality. When you have alot (thousands) of "one-time delimeters" in a file that you are working with, you could just stick them in an array and pass it to split. With regexes you would have to write a mechanism to create the entire regular expression for you (not particularly clean) or create a loop capturing up to the next delimeter per iteration (probably less efficient although not really difficult). I just think this would be useful (at least to me), and adding it to split seemed natural given split's power and usefulness.
tigervamp | [reply] |
|
|
If you want to keep an array with the delimiters around,
just do it. And when it's time to pass it on to split,
just do:
@chunks = do {local $" = "|"; split "@array" => $str};
assuming that @array is the array with your
delimiters. You can of course put that in a sub:
sub mysplit (\@;@) {
local $" = "|";
split "@{+shift}" => @_;
}
@chunks = mysplit @array => $str;
Abigail | [reply] [d/l] [select] |
|
|
Idiomatic as always. Is there still a fat comma in Perl 6? However the key point for me is that if you are going to split on a array of delimiters you often need to sort them by length first to get the behaviour you want. The ':' '::' example is a good case in point. If the order is ':', '::' you will never split on '::' as Perl will always do the ':' split and as a result return a number of (probably) unwanted null fields if we have any instances of '::' in the split string. This also holds true in the more usual case where you are doing a match or sub (on|a|range|of|odds|and|ends). If we used that order we would never match 'range' or 'and' as we always match the 'a' - unless we applied boundary conditions, etc.... cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] |
|
|
|
|
|
|
|
|
| [reply] |
|
|
|
|
#! perl -slw
use strict;
sub mysplit (\@;@) {
local $" = "|"; #"
split "@{+shift}" => @_;
}
my @array = (':','::','\s+');
my $string = "a:b::c d";
my @chunks = mysplit @array => $string;
print @chunks;
__END__
C:\test>218759.pl
1
Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
Just be grateful that you arrived just as the tornado season finished. Them buggers are real work. | [reply] [d/l] |
|
|
|
|
my @delim_seq = qw/: :: \s+ = \n/;
my $string = "a:b::c d=e\n";
my @fields = ();
foreach my $delim ( @delim_seq ) {
$string =~ s/(.*?)$delim//;
push @fields, $1;
}
(Note that the final delimiter in the sequence is assumed to
be the final pattern on the line -- i.e. the line terminator.)
So you have two statements inside a loop, instead of a single
statement using "split(...)" -- I could live with that easily
enough (whereas I'd worry about adding complexity to a basic
function like "split").
| [reply] [d/l] |
|
|
Can you further describe the nature of these delimiters to me? Are you saying that they vary line-by-line, file-by-file, or some other way (arbitrary)?
In any of these cases, are you also attempting to dynamically determine the delimiters, vs coding their nature into the script ahead of time? If so, then building a regexp vs building an array to pass to a modified split seem to be about the same amount of effort. (you don't have to use eval to build a compound regexp, btw, via the use of qr references).
Anyway -- I'm shooting in the dark here. I'm really just curious about the nature of your delimeters and the effort, ultimately, that you are trying to spare yourself.
Matt
| [reply] [d/l] |