in reply to What is the fastest way to extract data from a delimited string?

Why is split not up to the task? It seems to me that it is faster than your regex.

#!/usr/bin/perl -w use strict; use Benchmark; my $column = 6; my $delim = " "; my $s = qq(a bb ccc dddd eeee fff gggggg hh i jjjjjjjj); timethese (1000000,{ 'regex' => sub { my($result) = $s =~ /(([^$delim]*)$delim?){$column}/; }, 'split' => sub { my ($result) = (split /$delim/, $s)[$column] } }); __END__ Benchmark: timing 1000000 iterations of regex, split... regex: 10 wallclock secs ( 8.90 usr + -0.00 sys = 8.90 CPU) split: 9 wallclock secs ( 8.27 usr + 0.02 sys = 8.29 CPU) Benchmark: timing 1000000 iterations of regex, split... regex: 9 wallclock secs ( 8.69 usr + 0.09 sys = 8.78 CPU) split: 7 wallclock secs ( 7.87 usr + -0.00 sys = 7.87 CPU)

The only difference is that split will start numbering your columns from 0, so to get the 6th column you should use $column = 5;

  • Comment on Re: What is the fastest way to extract data from a delimited string?
  • Download Code

Replies are listed 'Best First'.
Re: Re: What is the fastest way to extract data from a delimited string?
by thezip (Vicar) on Jan 10, 2003 at 01:10 UTC

    Your results seem convincing, yet everything I have heard about split is that it not the most efficient solution.

    My instincts tell me that there is a regex which will outperform both split and the regex I have submitted for criticism.

    Vannah, I'd like to buy a *REGEX* please...

    Where do you want *them* to go today?
      You're out of luck then because split will *always* be much faster than any non-trivial regex. For large files, split is the only way to go. Your instincts are wrong, completely wrong.

      --
      Regards,
      Helgi Briem
      helgi AT decode DOT is