Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Does perl optimize the split and only grab the first field (the 'a') and not continue breaking up the line after that? (I know this may not be the best place to ask this question, but I couldn't find a better one.)$x = "a b c d e f g"; ($y) = (split(/\s+/,$x))[0];
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
RE: is split optimized?
by Russ (Deacon) on Jul 14, 2000 at 06:09 UTC | |
Does split only find the first chunk when you only ask for the first chunk? No. Since you are calling split in a list context, it generates a list of all elements and assigns that to your list. You were very careful to call split in list context, BTW. In your example, $y does not have to be in parentheses, because your right-side construct puts split in list context. Both <nobr>($y) = split()</nobr> and <nobr>$y = (split())[0]</nobr> call split in list context. You don't need both, but it certainly works as you have it. BTW, Seekers of Perl Wisdom is the right place to ask this kind of question. Russ | [reply] |
by gryng (Hermit) on Jul 14, 2000 at 06:27 UTC | |
(relevant code:) While a tad faster, using something besides split (such as a regex) show that split doesn't optimize away the other entries. See my other post on this thread for more benchmark results.
Ciao,
| [reply] [d/l] [select] |
|
Re: is split optimized?
by btrott (Parson) on Jul 14, 2000 at 06:25 UTC | |
In this case split should only split your string once, and after it's seen the first \s+ it should stop. Now, I can't say that it will actually *do that*, but that's what it *seems* should happen. From the split docs:
| [reply] [d/l] [select] |
by gryng (Hermit) on Jul 14, 2000 at 06:34 UTC | |
The code for this is: Note that using an equivalent regex is slightly faster, but the split done properly (using the third arguement) preforms at the "correct" level. Thanks for submitting the correct answer :) hehe.
Chow,
| [reply] [d/l] [select] |
by Abigail (Deacon) on Jul 14, 2000 at 10:09 UTC | |
But you used it in an incorrect way. If the third argument is 1, it's effectively a noop. The third argument does not mean to discard everything after the first field.
my ($y) = split " ", "a b", 1;
print $y;
will print a b, and not a.
If you want to use only the first field, and use a third argument, just use:
my ($y) = split " ", $string, 2;
That's right. No indexing required. But even the limit isn't
required. Just the simple:
my ($y) = split " ", $string;
will do. And because it is so simple, Perl can optimize that.
Here's a benchmark program (there are brackets where indexing
is used - for some reason, perlmonks strip them), and the
results:
#!/opt/perl/bin/perl -w
use strict;
use Benchmark;
my $str = "a " x 6;
timethese -100 => {
index => sub {my ($y) = (split " " => $str) [0]},
regex => sub {my ($y) = $str =~ /(\S+)/},
limit => sub {my ($y) = (split " " => $str, 2) [0]},
plain => sub {my ($y) = split " " => $str},
}
__END__
Benchmark: running index, limit, plain, regex, each for at least 100 CPU seconds...
index: 125 wallclock secs (105.53 usr + 0.00 sys = 105.53 CPU) @ 34487.35/s (n=3639450)
regex: 121 wallclock secs (105.06 usr + 0.00 sys = 105.06 CPU) @ 43695.61/s (n=4590661)
limit: 123 wallclock secs (104.03 usr + 0.02 sys = 104.05 CPU) @ 48699.04/s (n=5067135)
plain: 120 wallclock secs (105.18 usr + 0.02 sys = 105.20 CPU) @ 52044.32/s (n=5475062)
The bottom line is, if you want Perl to do the optimizing, keep your code simple. -- Abigail | [reply] |
by DrManhattan (Chaplain) on Jul 14, 2000 at 18:20 UTC | |
by Abigail (Deacon) on Jul 15, 2000 at 00:18 UTC | |
by gryng (Hermit) on Jul 14, 2000 at 18:18 UTC | |
by Abigail (Deacon) on Jul 15, 2000 at 00:15 UTC | |
| |
by eduardo (Curate) on Jul 14, 2000 at 06:37 UTC | |
the extra argument version of split kicks the living crap out of not using it... so i guess, yes ;) it does make a difference! | [reply] [d/l] |
|
Re: is split optimized?
by gryng (Hermit) on Jul 14, 2000 at 06:19 UTC | |
And the results:
Enjoy!
| [reply] [d/l] [select] |
by mikfire (Deacon) on Jul 14, 2000 at 06:38 UTC | |
As Russ said, split is incredibly well optimized. Most of the perl internals are. There have been many C coders of wonderous talent pouring over the code to make it so. You code demonstrates that the AM was not using the correct tool, which is an answer to an unasked question. Your four pieces of code are doing radically different things. The regex is stopping after the first match, while the split must work the entire string. Until you compare apples to apples, no conclusion can be drawn. Let us run this test and do it correctly. Note the slight changes I made to the regex code. That should result in a better comparison. When comparing apples to apples, it seems split is highly optimized. This more an issue of choosing the right tool for the job at hand.
This rant brought to you by | [reply] [d/l] |
by gryng (Hermit) on Jul 14, 2000 at 06:59 UTC | |
I answered Anonymous Monks's question of wether "perl optimize(s) the split and only grab the first field" with the line: ($y) = (split(/\s+/,$x))[0]; To which the answer is no. But as I conceeded to btrott, he (and nardo) had the "correct" answer, of saying that you need to add a third arguement to get split to only do the first match. Appreciately, you probably responded because I used regex's in my example and you did not want Anonymous Monk to mistakenly think that regex's were faster than split. However I do not think he took it that way, rather his post seemed to convey an understanding of all of what I just mentioned above (minus the fact that we did not know/remember about the third arguement in split). Anyway, this is mainly here to clear up the comment in the chatterbox about benchmark being a hammer. I was only using it to show, fairly concretely, that split was not stopping after the first match. I didn't mean to imply that it couldn't.
Cheers,
| [reply] [d/l] |
by Anonymous Monk on Jul 14, 2000 at 06:28 UTC | |
| [reply] |
by gryng (Hermit) on Jul 14, 2000 at 06:36 UTC | |
| [reply] |
|
Re: is split optimized?
by nardo (Friar) on Jul 14, 2000 at 06:39 UTC | |
When assigning to a list, if LIMIT is omitted, Perl supplies a LIMIT one larger than the number of variables in the list So, ($y) = split(/\s+/, $x) is equivalent to ($y) = split(/\s+/, $x, 2) which, as others have pointed out, will only split it once. | [reply] [d/l] [select] |
|
Re: is split optimized?
by bs (Novice) on Jul 14, 2000 at 08:02 UTC | |
I don't know if split is or isn't optimized; I just have a story related to split to share. In our code library, we have a function named rand_split, which looks like this:
sub rand_split {
my ($sep, $string) = @_;
my ($element, $char, $pos, $end, @array);
$end = length($string);
$element = "";
for ($pos = 0; $pos < $end; $pos++) {
$char = substr($string, $pos, 1);
if ($char eq $sep) {
push (@array, $element);
$element = "";
} else {
$element = $element . $char;
}
}
push (@array, $element);
return (@array);
}
rand_split was written by a guy named Rand a couple of programmer generations ago. We don't know why he reimplimented it; We don't know a lot of things about it. However, another programmer from around that generation claimed that "If he did it, there must have been a reason." It's not like he didn't know split didn't exist, meaning he purposefully reinvented a wheel. As nearly as we can tell, it walks like split and talks like split; Therefore, it is split. It's used in one script these days, and probably zero in the very near future, but I think we've kept it around mainly for gag value, giving every new generation of programmer something to wonder about. | [reply] |
by athomason (Curate) on Jul 14, 2000 at 10:17 UTC | |
These were the disheartening results: Clearly, split is quite capable of optimizing static patterns and doing it much faster than Perl code (since split is, of course, implemented in C). Gag value is about all you'll get out of this routine ;-). | [reply] [d/l] [select] |
by nuance (Hermit) on Jul 14, 2000 at 14:30 UTC | |
Hmm... As you've clearly demonstrated the code runs like a three legged greyhound. I would be interested to know what version of Perl Mr Rand was using when he wrote the function and do the benchmark with it. You may find the answer is the same. Then again you may find that the split function has been optimised since rand_split was written. Nuance | [reply] |
|
RE: is split optimized?
by Anonymous Monk on Jul 14, 2000 at 15:37 UTC | |
| [reply] |