Re: Speed of Split

This may not be representative, but a simple test shows that regexp could be much much faster here:


use Benchmark ();

our @data;
my $line = '1.000000     '
. '    100.273    121.54     98.169    121.58'
. '    100.273    121.54     98.169    121.58'
. '    100.273    121.54     98.169    121.58'
. '    100.273    121.54     98.169    121.58';

Benchmark::cmpthese(0, {
   split        => sub { @data = split(/\s+/, $line) },
   fixed_length => sub { @data = $line =~ /^.{8} {6}(.{10})(.{10})(.{1
+0})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})$/ 
+},
   var_length   => sub { @data = $line =~ /^.{8}\s+(\S+)\s+(\S+)\s+(\S
++)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+
+(\S+)$/ },
});

__END__
                 Rate        split   var_length fixed_length
split         63116/s           --         -30%         -87%
var_length    90310/s          43%           --         -81%
fixed_length 482454/s         664%         434%           --
[download]

Of course, fixed_length would assume that you do your own joining, since join would not preserve field widths.

Comment on Re: Speed of Split Select or Download Code

Replies are listed 'Best First'.
Re^2: Speed of Split by bart (Canon) on Nov 18, 2004 at 14:32 UTC
You missed a few alternatives. I've added them myself, and ran the benchmark again. I must say that they perform rather poorly. `Benchmark::cmpthese(0, { split => sub { @data = split(/\s+/, $line) }, fixed_length => sub { @data = $line =~ /^.{8} {6}(.{10})(.{10})(.{1 +0})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})$/ +}, var_length => sub { @data = $line =~ /^.{8}\s+(\S+)\s+(\S+)\s+(\S ++)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+ +(\S+)$/ }, g => sub { @data = $line =~ /\S+/g; }, unpack => sub { @data = unpack 'A8x6A10A10A10A10A10A10A10A10A +10A10A10A10', $line } });` [download] Result: `Rate g unpack split var_length fi +xed_length g 16954/s -- -54% -70% -76% + -96% unpack 36961/s 118% -- -35% -47% + -91% split 56965/s 236% 54% -- -19% + -86% var_length 70373/s 315% 90% 24% -- + -83% fixed_length 408377/s 2309% 1005% 617% 480% + --` [download] You ignore the first field, I include it... but that shouldn't matter much.	[reply] [d/l] [select]
Re^2: Speed of Split by Chady (Priest) on Nov 18, 2004 at 10:04 UTC
Here's what your benchmarks result in on one of my machines: `Rate split fixed_length var_length split 26023/s -- -64% -93% fixed_length 71906/s 176% -- -81% var_length 378591/s 1355% 427% --` [download] this is with perl, v5.6.1 built for i386-linux on a 333MHz celeron He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life. Chady \| http://chady.net/ Are you a Linux user in Lebanon? join the Lebanese Linux User Group.	[reply] [d/l]