in reply to Speed of Split

This may not be representative, but a simple test shows that regexp could be much much faster here:

use Benchmark (); our @data; my $line = '1.000000 ' . ' 100.273 121.54 98.169 121.58' . ' 100.273 121.54 98.169 121.58' . ' 100.273 121.54 98.169 121.58' . ' 100.273 121.54 98.169 121.58'; Benchmark::cmpthese(0, { split => sub { @data = split(/\s+/, $line) }, fixed_length => sub { @data = $line =~ /^.{8} {6}(.{10})(.{10})(.{1 +0})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})$/ +}, var_length => sub { @data = $line =~ /^.{8}\s+(\S+)\s+(\S+)\s+(\S ++)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+ +(\S+)$/ }, }); __END__ Rate split var_length fixed_length split 63116/s -- -30% -87% var_length 90310/s 43% -- -81% fixed_length 482454/s 664% 434% --

Of course, fixed_length would assume that you do your own joining, since join would not preserve field widths.

Replies are listed 'Best First'.
Re^2: Speed of Split
by bart (Canon) on Nov 18, 2004 at 14:32 UTC
    You missed a few alternatives. I've added them myself, and ran the benchmark again.

    I must say that they perform rather poorly.

    Benchmark::cmpthese(0, { split => sub { @data = split(/\s+/, $line) }, fixed_length => sub { @data = $line =~ /^.{8} {6}(.{10})(.{10})(.{1 +0})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})$/ +}, var_length => sub { @data = $line =~ /^.{8}\s+(\S+)\s+(\S+)\s+(\S ++)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+ +(\S+)$/ }, g => sub { @data = $line =~ /\S+/g; }, unpack => sub { @data = unpack 'A8x6A10A10A10A10A10A10A10A10A +10A10A10A10', $line } });

    Result:

    Rate g unpack split var_length fi +xed_length g 16954/s -- -54% -70% -76% + -96% unpack 36961/s 118% -- -35% -47% + -91% split 56965/s 236% 54% -- -19% + -86% var_length 70373/s 315% 90% 24% -- + -83% fixed_length 408377/s 2309% 1005% 617% 480% + --

    You ignore the first field, I include it... but that shouldn't matter much.

Re^2: Speed of Split
by Chady (Priest) on Nov 18, 2004 at 10:04 UTC

    Here's what your benchmarks result in on one of my machines:

    Rate split fixed_length var_length split 26023/s -- -64% -93% fixed_length 71906/s 176% -- -81% var_length 378591/s 1355% 427% --

    this is with perl, v5.6.1 built for i386-linux on a 333MHz celeron


    He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.
    Chady | http://chady.net/
    Are you a Linux user in Lebanon? join the Lebanese Linux User Group.