Re: Re: expanding the functionality of split

Replies are listed 'Best First'.
Re: expanding the functionality of split by Abigail-II (Bishop) on Dec 10, 2002 at 10:31 UTC
If you want to keep an array with the delimiters around, just do it. And when it's time to pass it on to split, just do: `@chunks = do {local $" = "\|"; split "@array" => $str};` [download] assuming that `@array` is the array with your delimiters. You can of course put that in a sub: `sub mysplit (\@;@) { local $" = "\|"; split "@{+shift}" => @_; } @chunks = mysplit @array => $str;` [download] Abigail	[reply] [d/l] [select]
Re: Re: expanding the functionality of split by tachyon (Chancellor) on Dec 10, 2002 at 15:15 UTC
Idiomatic as always. Is there still a fat comma in Perl 6? However the key point for me is that if you are going to split on a array of delimiters you often need to sort them by length first to get the behaviour you want. The ':' '::' example is a good case in point. If the order is ':', '::' you will never split on '::' as Perl will always do the ':' split and as a result return a number of (probably) unwanted null fields if we have any instances of '::' in the split string. This also holds true in the more usual case where you are doing a match or sub (on\|a\|range\|of\|odds\|and\|ends). If we used that order we would never match 'range' or 'and' as we always match the 'a' - unless we applied boundary conditions, etc.... cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply]
Re: expanding the functionality of split by Abigail-II (Bishop) on Dec 10, 2002 at 15:29 UTC
But sorting on length will not do. A trivial and silly example, when the delimiters are `abc` and `abc`, if you try `abc` first, you'll never succeed on `abc`. You could make an ordering if you can decide whether one regex will match everything another does. I doubt this is a decidable question for Perl regular expressions. It is for "normal" regular expressions, and, IIRC, undecidable for context free grammars. Perl regular expressions are hard to qualify in this sense, but even if it's theoretical possible, it's not going to be cheap, and hence the price would be high. It's going to be a responsibility of the programmer to pass in the options in a logical order; just as already is required for alternations in regular expressions. Abigail	[reply] [d/l] [select]
Re^2: expanding the functionality of split by Aristotle (Chancellor) on Dec 10, 2002 at 15:37 UTC
Re: Re: expanding the functionality of split by tachyon (Chancellor) on Dec 10, 2002 at 15:57 UTC
Re: Re: expanding the functionality of split by BrowserUk (Patriarch) on Dec 10, 2002 at 15:07 UTC
Isn't there a problem with this? What happens if one of the delimiter @array elements contains a literal '\|'? Won't that screw things up? Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color. Pick up your cloud down the end and "Yes" if you get allocated a grey one they *are* a bit damp under foot, but someone has to get them. Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory. Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.	[reply]
Re: expanding the functionality of split by Abigail-II (Bishop) on Dec 10, 2002 at 15:19 UTC
It will screw up things in the same as other suggestions in the thread will screw up things. It will also screw up in the same was as using a literal `'\|'` in current `split` will screw up things. It will also screw up things if you put in a literal `(` or a `[`. Don't do it then! Backwack it! (Twice if you use double quotes). Abigail	[reply] [d/l] [select]
Re: Re: expanding the functionality of split by BrowserUk (Patriarch) on Dec 10, 2002 at 17:03 UTC
Sorry Abigail, I can't get your neat solution to work for me, could you explain what I am doing wrong? `#! perl -slw use strict; sub mysplit (\@;@) { local $" = "\|"; #" split "@{+shift}" => @_; } my @array = (':','::','\s+'); my $string = "a:b::c d"; my @chunks = mysplit @array => $string; print @chunks; __END__ C:\test>218759.pl 1` [download] Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color. Pick up your cloud down the end and "Yes" if you get allocated a grey one they *are* a bit damp under foot, but someone has to get them. Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory. Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.	[reply] [d/l]
Re: expanding the functionality of split by Abigail-II (Bishop) on Dec 10, 2002 at 17:20 UTC
Blasted prototyping in Perl. The second argument to `split` is taken to be in scalar context, hence the `@_` becomes equivalent to `1`. Abigail	[reply] [d/l] [select]
Re: Re: Re: expanding the functionality of split by graff (Chancellor) on Dec 10, 2002 at 04:47 UTC
using split can be much easier and cleaner in certain circumstances if it had this functionality I'm not convinced that an array-based split would be so easy to work with. I think that looping a simple regex match over an array of delimiters would be easiest for the average programmer to grasp (and maintain); something like: `my @delim_seq = qw/: :: \s+ = \n/; my $string = "a:b::c d=e\n"; my @fields = (); foreach my $delim ( @delim_seq ) { $string =~ s/(.*?)$delim//; push @fields, $1; }` [download] (Note that the final delimiter in the sequence is assumed to be the final pattern on the line -- i.e. the line terminator.) So you have two statements inside a loop, instead of a single statement using "split(...)" -- I could live with that easily enough (whereas I'd worry about adding complexity to a basic function like "split").	[reply] [d/l]
Re: Re: Re: expanding the functionality of split by mojotoad (Monsignor) on Dec 10, 2002 at 01:13 UTC
Can you further describe the nature of these delimiters to me? Are you saying that they vary line-by-line, file-by-file, or some other way (arbitrary)? In any of these cases, are you also attempting to dynamically determine the delimiters, vs coding their nature into the script ahead of time? If so, then building a regexp vs building an array to pass to a modified `split` seem to be about the same amount of effort. (you don't have to use eval to build a compound regexp, btw, via the use of qr references). Anyway -- I'm shooting in the dark here. I'm really just curious about the nature of your delimeters and the effort, ultimately, that you are trying to spare yourself. Matt	[reply] [d/l]