More benchmarks and stuff

Thanks, I had mistyped my numbers (I was on a old 15" monitor where the font is set to like 3 pixels high and even 640x480 looked fuzzy, yeck).

Anyway, I noticed that there was some descripency between the relative speeds of regex versus split (that is, split used properly). And I wanted to see why, so first I added a few more tests:

        Four  => sub { $testlarge =~ /([^\s]*)\s+/; my $y = $1; },
        Eight => sub { my $y = $testlarge =~ /(\S+)/; },
        Six   => sub { my ($y) =  split(/\s+/, $testlarge,2)    },
        Seven => sub { my ($y) =  split(/\s+/, $testlarge)      }
[download]

I noticed that your regex was different, so I wanted to see if was why things were slower (however, I didn't think so, since your's was simplier).

Running with string equal to "a " x 100 000 , I got these numbers:

     Eight: 11 wallclock secs ( 4.68 usr +  5.46 sys = 10.14 CPU) @ 45
+0.69/s (n=4570)
      Four: 10 wallclock secs ( 4.79 usr +  5.21 sys = 10.00 CPU) @ 44
+9.60/s (n=4496)
     Seven: 10 wallclock secs ( 7.28 usr +  2.84 sys = 10.12 CPU) @ 29
+3.97/s (n=2975)
       Six: 10 wallclock secs ( 7.13 usr +  2.88 sys = 10.01 CPU) @ 29
+3.81/s (n=2941)
[download]

Which were (despite using the same regex) were still 50% faster than split, rather than being 40% slower.
Next I reduced the size of my string to: "a " x 100 . Here I got these numbers:

     Eight: 12 wallclock secs (10.00 usr +  0.00 sys = 10.00 CPU) @ 13
+0970.90/s (n=1309709)
      Four: 11 wallclock secs (10.39 usr +  0.00 sys = 10.39 CPU) @ 88
+839.36/s (n=923041)
     Seven: 10 wallclock secs (10.49 usr +  0.00 sys = 10.49 CPU) @ 12
+2499.33/s (n=1285018)
       Six: 10 wallclock secs (10.54 usr +  0.00 sys = 10.54 CPU) @ 12
+1918.22/s (n=1285018)
[download]

Now the regex code (yours) leads by less than 10%, and my regex trails by a good 30%. So, I guess the conclusion is that regex preforms better than split on large scalars? I don't feel like mucking in the perl source code right now, so my guess as to why this is, has nothing to do with the way regex's or splits actually process the data, but rather that split is probably receiving a copy of the data, whereas regex is receiving a reference.

Cheers,
Gryn

Comment on More benchmarks and stuff Select or Download Code

Replies are listed 'Best First'.
RE: More benchmarks and stuff by Abigail (Deacon) on Jul 15, 2000 at 00:15 UTC
I don't feel like mucking in the perl source code right now, so my guess as to why this is, has nothing to do with the way regex's or splits actually process the data, but rather that split is probably receiving a copy of the data, whereas regex is receiving a reference. No, it has to do with what `split` and regex produce, not with what they get (behind the scenes, everything is a reference anyway). The regex only has to create a short new string, while the split (even if there's a split into 2 fields) has to create a large new string. And that's taking time. So, if you have a gigantic string, and you only want the first, short field, a regex is the way to go. But usually you encounter short strings, and that's when `split` works better, despite itself using a regex. (But a much simpler regex, and it even might be that the case of `" "` is optimized itself too). -- Abigail	[reply]
The conclusion of: More benchmarks and stuff by gryng (Hermit) on Jul 15, 2000 at 02:25 UTC
Thanks Abigail, that sounds reasonable enough to believe! :) Ciao, Gryn	[reply]