in reply to Using the third arguement for split
in thread is split optimized?

Thanks for submitting the correct answer :) hehe.

But you used it in an incorrect way. If the third argument is 1, it's effectively a noop. The third argument does not mean to discard everything after the first field.

    my ($y) = split " ", "a b", 1;
    print $y;
will print a b, and not a.

If you want to use only the first field, and use a third argument, just use:

    my ($y) = split " ", $string, 2;
That's right. No indexing required. But even the limit isn't required. Just the simple:
    my ($y) = split " ", $string;
will do. And because it is so simple, Perl can optimize that. Here's a benchmark program (there are brackets where indexing is used - for some reason, perlmonks strip them), and the results:
#!/opt/perl/bin/perl -w

use strict;
use Benchmark;

my $str = "a " x 6;

timethese -100 => {           
    index   =>  sub {my ($y) = (split " " => $str) [0]},
    regex   =>  sub {my ($y) = $str =~ /(\S+)/},
    limit   =>  sub {my ($y) = (split " " => $str, 2) [0]},
    plain   =>  sub {my ($y) =  split " " => $str},
}

__END__
Benchmark: running index, limit, plain, regex, each for at least 100 CPU seconds...
index: 125 wallclock secs (105.53 usr +  0.00 sys = 105.53 CPU) @ 34487.35/s (n=3639450)
regex: 121 wallclock secs (105.06 usr +  0.00 sys = 105.06 CPU) @ 43695.61/s (n=4590661)
limit: 123 wallclock secs (104.03 usr +  0.02 sys = 104.05 CPU) @ 48699.04/s (n=5067135)
plain: 120 wallclock secs (105.18 usr +  0.02 sys = 105.20 CPU) @ 52044.32/s (n=5475062)

The bottom line is, if you want Perl to do the optimizing, keep your code simple.

-- Abigail

  • Comment on RE: Using the third arguement for split

Replies are listed 'Best First'.
Brackets
by DrManhattan (Chaplain) on Jul 14, 2000 at 18:20 UTC

    You can post code with brackets by enclosing it in a <CODE> </CODE> block. See the Site How To for more information.

    - Matt

      Urg, that's truely twisted. Not only has <CODE> already a meaning in HTML, picking indexing as a short cut for links on a site devoted to programming isn't the most convenient choice.

      -- Abigail

More benchmarks and stuff
by gryng (Hermit) on Jul 14, 2000 at 18:18 UTC
    Thanks, I had mistyped my numbers (I was on a old 15" monitor where the font is set to like 3 pixels high and even 640x480 looked fuzzy, yeck).

    Anyway, I noticed that there was some descripency between the relative speeds of regex versus split (that is, split used properly). And I wanted to see why, so first I added a few more tests:

    Four => sub { $testlarge =~ /([^\s]*)\s+/; my $y = $1; }, Eight => sub { my $y = $testlarge =~ /(\S+)/; }, Six => sub { my ($y) = split(/\s+/, $testlarge,2) }, Seven => sub { my ($y) = split(/\s+/, $testlarge) }
    I noticed that your regex was different, so I wanted to see if was why things were slower (however, I didn't think so, since your's was simplier).

    Running with string equal to "a " x 100 000 , I got these numbers:

    Eight: 11 wallclock secs ( 4.68 usr + 5.46 sys = 10.14 CPU) @ 45 +0.69/s (n=4570) Four: 10 wallclock secs ( 4.79 usr + 5.21 sys = 10.00 CPU) @ 44 +9.60/s (n=4496) Seven: 10 wallclock secs ( 7.28 usr + 2.84 sys = 10.12 CPU) @ 29 +3.97/s (n=2975) Six: 10 wallclock secs ( 7.13 usr + 2.88 sys = 10.01 CPU) @ 29 +3.81/s (n=2941)
    Which were (despite using the same regex) were still 50% faster than split, rather than being 40% slower.
    Next I reduced the size of my string to: "a " x 100 . Here I got these numbers:
    Eight: 12 wallclock secs (10.00 usr + 0.00 sys = 10.00 CPU) @ 13 +0970.90/s (n=1309709) Four: 11 wallclock secs (10.39 usr + 0.00 sys = 10.39 CPU) @ 88 +839.36/s (n=923041) Seven: 10 wallclock secs (10.49 usr + 0.00 sys = 10.49 CPU) @ 12 +2499.33/s (n=1285018) Six: 10 wallclock secs (10.54 usr + 0.00 sys = 10.54 CPU) @ 12 +1918.22/s (n=1285018)
    Now the regex code (yours) leads by less than 10%, and my regex trails by a good 30%. So, I guess the conclusion is that regex preforms better than split on large scalars? I don't feel like mucking in the perl source code right now, so my guess as to why this is, has nothing to do with the way regex's or splits actually process the data, but rather that split is probably receiving a copy of the data, whereas regex is receiving a reference.

    Cheers,
    Gryn

      I don't feel like mucking in the perl source code right now, so my guess as to why this is, has nothing to do with the way regex's or splits actually process the data, but rather that split is probably receiving a copy of the data, whereas regex is receiving a reference.

      No, it has to do with what split and regex produce, not with what they get (behind the scenes, everything is a reference anyway). The regex only has to create a short new string, while the split (even if there's a split into 2 fields) has to create a large new string. And that's taking time.

      So, if you have a gigantic string, and you only want the first, short field, a regex is the way to go. But usually you encounter short strings, and that's when split works better, despite itself using a regex. (But a much simpler regex, and it even might be that the case of " " is optimized itself too).

      -- Abigail

        Thanks Abigail, that sounds reasonable enough to believe! :)

        Ciao,
        Gryn