Re: What is the fastest way to extract data from a delimited string?

Why is split not up to the task? It seems to me that it is faster than your regex.

#!/usr/bin/perl -w
use strict;
use Benchmark;

my $column = 6;
my $delim = " ";
my $s = qq(a bb ccc dddd eeee fff gggggg hh i jjjjjjjj);

timethese (1000000,{
    'regex' => sub {
        my($result) = $s =~ /(([^$delim]*)$delim?){$column}/;
    },
    'split' => sub {
        my ($result) = (split /$delim/, $s)[$column]
    }
});

__END__
Benchmark: timing 1000000 iterations of regex, split...
     regex: 10 wallclock secs ( 8.90 usr + -0.00 sys =  8.90 CPU)
     split:  9 wallclock secs ( 8.27 usr +  0.02 sys =  8.29 CPU)

Benchmark: timing 1000000 iterations of regex, split...
     regex:  9 wallclock secs ( 8.69 usr +  0.09 sys =  8.78 CPU)
     split:  7 wallclock secs ( 7.87 usr + -0.00 sys =  7.87 CPU)
[download]

The only difference is that split will start numbering your columns from 0, so to get the 6th column you should use $column = 5;

Comment on Re: What is the fastest way to extract data from a delimited string? Download Code

Replies are listed 'Best First'.
Re: Re: What is the fastest way to extract data from a delimited string? by thezip (Vicar) on Jan 10, 2003 at 01:10 UTC
Your results seem convincing, yet everything I have heard about split is that it not the most efficient solution. My instincts tell me that there is a regex which will outperform both split and the regex I have submitted for criticism. Vannah, I'd like to buy a REGEX please... Where do you want them* to go today?*	[reply]
Re: Re: Re: What is the fastest way to extract data from a delimited string? by helgi (Hermit) on Jan 10, 2003 at 12:55 UTC
You're out of luck then because split will always be much faster than any non-trivial regex. For large files, split is the only way to go. Your instincts are wrong, completely wrong. -- Regards, Helgi Briem helgi AT decode DOT is	[reply]