expanding the functionality of split

Replies are listed 'Best First'.
Re: expanding the functionality of split by tachyon (Chancellor) on Dec 09, 2002 at 23:45 UTC
Are there easy ways to acheive the general ideas here without modifying split? Sure is. Just use the old join with '\|' trick which is very useful for matching an array of patterns. You may or may not want to do a map { quotmeta } @paterns. The sort on length is so that we match '::' before ':' `$string = "a:b::c d"; @patterns = (':','::','\s+'); my $re = join '\|', sort { length $b <=> length $a } @patterns; @fields = split /$re/, $string; print "Got '$_'\n" for @fields; __DATA__ Got 'a' Got 'b' Got 'c' Got 'd'` [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: expanding the functionality of split by BrowserUk (Patriarch) on Dec 10, 2002 at 01:37 UTC
This may or may not achieve your goal. It works for your test data and a few others I have tried but it is by no means fully tested. Whether you would want to retain the standard split functionality as part of this sub is doubtful, but it's there should you prefer to use a single function in all cases. Update: A slightly cleaner version, removed unused var and standard split functionality. `#! perl -slw use strict; sub mySplit { my ($pattern, $expr) = @_; push @$pattern, '$', ''; my ($n, @fields) = (0); push @fields, $1 while $expr =~ /(.?)$pattern->[$n]/gc, ++$n < @$ +pattern; return @fields; } my $string = "a:b::c d\|e"; my @fields = mySplit [':','::','\s+','\\|'], $string; print do{local $"='~'; "@fields";} #" __DATA__ C:\test>218685 a~b~c~d~e` [download] Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color. Pick up your cloud down the end and "Yes" if you get allocated a grey one they are* a bit damp under foot, but someone has to get them. Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory. Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.	[reply] [d/l]
Re^2: expanding the functionality of split by Aristotle (Chancellor) on Dec 10, 2002 at 14:11 UTC
Not sure all the extra effort for the limit parameter is worth it. <update>Sorry, brainblock. Must get some coffeine.</update> `my @delim = qw(: :: / // \\| \\|\\| \s+); mySplit \@delim, $str1; mySplit \@delim, $str2, 4;` [download] is the same as `my @delim = qw(: :: / / \\| \\|\\| \s+); mySplit \@delim, $str1; mySplit [ @delim[0..2] ], $str2;` [download] Makeshifts last the longest.	[reply] [d/l] [select]
Re: Re^2: expanding the functionality of split by BrowserUk (Patriarch) on Dec 10, 2002 at 15:05 UTC
Sorry? I'm not quite sure what you mean? The only reference/effort I made to the limit parameter was to pass it onto the standard split command if the first parm wasn't an array. Beyond that I ignore it? Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color. Pick up your cloud down the end and "Yes" if you get allocated a grey one they *are* a bit damp under foot, but someone has to get them. Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory. Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.	[reply]
Re: expanding the functionality of split by VSarkiss (Monsignor) on Dec 09, 2002 at 23:44 UTC
Maybe I'm not following you, but can't you just do: `@fields = split(/:\|::\|\s+/, $string);`In other words, just use alternation in the pattern?	[reply] [d/l]
Re: Re: expanding the functionality of split by tachyon (Chancellor) on Dec 09, 2002 at 23:49 UTC
This will actually fail because it will split on ':' before '::' and thus return unwanted null fields. You need to do `/::\|:\|\s+/`. See below.... `$string = "a:b::c d"; @fields = split(/:\|::\|\s+/, $string); print "Got '$_'\n" for @fields; __DATA__ Got 'a' Got 'b' Got '' Got 'c' Got 'd'` [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l] [select]
Re: Re: Re: expanding the functionality of split by runrig (Abbot) on Dec 10, 2002 at 00:20 UTC
You need to do /::\|:\|\s+/. And you could shorten that to just `/::?\|\s+/`	[reply] [d/l]
Re: Re: Re: Re: expanding the functionality of split by tigervamp (Friar) on Dec 10, 2002 at 00:51 UTC
Re: Re: Re: Re: Re: expanding the functionality of split by tachyon (Chancellor) on Dec 10, 2002 at 01:33 UTC
Some notes below your chosen depth have not been shown here
Re: expanding the functionality of split by elusion (Curate) on Dec 09, 2002 at 23:47 UTC
You can do this pretty easily with a regex. `@fields = split /([^:]):([^:])::([^\s])\s+([^:])/, $string;` [download] split keeps any fields that are captured with the regex elusion : http://matt.diephouse.com Update: Oops, sorry 'bout the pattern var. I changed the way I was doing things, then had to run out the door and forgot to remove it before submitting.	[reply] [d/l]
Re: Re: expanding the functionality of split by BrowserUk (Patriarch) on Dec 10, 2002 at 00:59 UTC
What does your var $pattern do in this? Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color. Pick up your cloud down the end and "Yes" if you get allocated a grey one they *are* a bit damp under foot, but someone has to get them. Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory. Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.	[reply]
Re: expanding the functionality of split by mojotoad (Monsignor) on Dec 10, 2002 at 00:21 UTC
Why are you so determined to use `split` in this instance? It looks to me like a regular extraction would work: `$string = "a:b::c d"; @fields = $string =~ /([^:]):+([^:]+):+(\S+)\s(\S+)/;` [download] Matt	[reply] [d/l] [select]
Re: Re: expanding the functionality of split by tigervamp (Friar) on Dec 10, 2002 at 01:08 UTC
Although regular expressions could work, using split can be much easier and cleaner in certain circumstances if it had this functionality. When you have alot (thousands) of "one-time delimeters" in a file that you are working with, you could just stick them in an array and pass it to split. With regexes you would have to write a mechanism to create the entire regular expression for you (not particularly clean) or create a loop capturing up to the next delimeter per iteration (probably less efficient although not really difficult). I just think this would be useful (at least to me), and adding it to split seemed natural given split's power and usefulness. tigervamp	[reply]
Re: expanding the functionality of split by Abigail-II (Bishop) on Dec 10, 2002 at 10:31 UTC
If you want to keep an array with the delimiters around, just do it. And when it's time to pass it on to split, just do: `@chunks = do {local $" = "\|"; split "@array" => $str};` [download] assuming that `@array` is the array with your delimiters. You can of course put that in a sub: `sub mysplit (\@;@) { local $" = "\|"; split "@{+shift}" => @_; } @chunks = mysplit @array => $str;` [download] Abigail	[reply] [d/l] [select]
Re: Re: expanding the functionality of split by tachyon (Chancellor) on Dec 10, 2002 at 15:15 UTC
Re: expanding the functionality of split by Abigail-II (Bishop) on Dec 10, 2002 at 15:29 UTC
Some notes below your chosen depth have not been shown here
Re: Re: expanding the functionality of split by BrowserUk (Patriarch) on Dec 10, 2002 at 15:07 UTC
Re: expanding the functionality of split by Abigail-II (Bishop) on Dec 10, 2002 at 15:19 UTC
Re: Re: expanding the functionality of split by BrowserUk (Patriarch) on Dec 10, 2002 at 17:03 UTC
Re: expanding the functionality of split by Abigail-II (Bishop) on Dec 10, 2002 at 17:20 UTC
Re: Re: Re: expanding the functionality of split by graff (Chancellor) on Dec 10, 2002 at 04:47 UTC
using split can be much easier and cleaner in certain circumstances if it had this functionality I'm not convinced that an array-based split would be so easy to work with. I think that looping a simple regex match over an array of delimiters would be easiest for the average programmer to grasp (and maintain); something like: `my @delim_seq = qw/: :: \s+ = \n/; my $string = "a:b::c d=e\n"; my @fields = (); foreach my $delim ( @delim_seq ) { $string =~ s/(.*?)$delim//; push @fields, $1; }` [download] (Note that the final delimiter in the sequence is assumed to be the final pattern on the line -- i.e. the line terminator.) So you have two statements inside a loop, instead of a single statement using "split(...)" -- I could live with that easily enough (whereas I'd worry about adding complexity to a basic function like "split").	[reply] [d/l]
Re: Re: Re: expanding the functionality of split by mojotoad (Monsignor) on Dec 10, 2002 at 01:13 UTC
Can you further describe the nature of these delimiters to me? Are you saying that they vary line-by-line, file-by-file, or some other way (arbitrary)? In any of these cases, are you also attempting to dynamically determine the delimiters, vs coding their nature into the script ahead of time? If so, then building a regexp vs building an array to pass to a modified `split` seem to be about the same amount of effort. (you don't have to use eval to build a compound regexp, btw, via the use of qr references). Anyway -- I'm shooting in the dark here. I'm really just curious about the nature of your delimeters and the effort, ultimately, that you are trying to spare yourself. Matt	[reply] [d/l]
Re: expanding the functionality of split by Aristotle (Chancellor) on Dec 10, 2002 at 14:36 UTC
Everyone else used a loop for the "sequence of one-time delimiters" spec - but I think a different approach has more merit in this case: `#!/usr/bin/perl -wl use strict; sub msplit { my ($delim, $str) = @_; my $pat = ''; $pat = "(?>(.?)$_$pat)?" for reverse @$delim; grep defined, $str =~ /^$pat(.+)/; } print for map "'$_'", msplit [qw(: :: \s+)], "a:b::c d"; __END__ 'a' 'b' 'c' 'd'` [download] Update:* fixed - needed nested brackets to abort looking for another field after a delimiter has failed to match. Makeshifts last the longest.	[reply] [d/l]
Re: Re: expanding the functionality of split by BrowserUk (Patriarch) on Dec 10, 2002 at 19:03 UTC
Didn't you just trade one loop for two? One to build the regex, one to remove the null captures made by the regex? Why does this have "more merit"? Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color. Pick up your cloud down the end and "Yes" if you get allocated a grey one they *are* a bit damp under foot, but someone has to get them. Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory. Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.	[reply]
Re^3: expanding the functionality of split by Aristotle (Chancellor) on Dec 11, 2002 at 09:48 UTC
You're right. I prefer this approach as it lets `perl` do the job of building and keeping track of the list internally - that's what "more merit" was referring to. The original version of this approach did not have the grep, and I didn't think about the fact I was introducing a second loop when I added it. I tried to get rid of it as follows: `#!/usr/bin/perl -wl use strict; sub msplit { my ($delim, $str) = @_; my $pat; ### $pat = q/ (?> (.?) (??{ shift @$delim }) (??{ $pat }) )? /; ### U +PDATE: WRONG $pat = q/ (?> (.?) $$delim[0] (??{ shift @$delim; $pat }) )? /; $str =~ /^ $pat (.+) /x; } print for map "'$_'", msplit [qw(: :: \s+)], "a:b::c d";` [download] I like this version even better as it should be even more economical: the first failed match bails out of the pattern so it does not do any more work on building the regex than necessary. Unfortunately Perl complains about the lack of `use re 'eval';` even if I add it. I'm not quite sure as to what's missing.. Maybe one of our resident regex spell casters can enlighten me? Makeshifts last the longest.	[reply] [d/l]
Re: Re^3: expanding the functionality of split by BrowserUk (Patriarch) on Dec 11, 2002 at 14:42 UTC
Re^5: expanding the functionality of split by Aristotle (Chancellor) on Dec 11, 2002 at 15:06 UTC
Some notes below your chosen depth have not been shown here