Re: regex or split

I am unable to confirm your results. Why are you using both map and split in the first method? Why not a simple split which is much faster than either?

use warnings;
use strict;
use Benchmark;
my $data="
this
is
a string
example";


my @data;

my $count = 100000;
timethese($count, 
{ 
    'map_split' => sub {@data=map { $_.="\n" } split (/\n/, $data);}, 
    'simple_split' => sub { @data = split "\n",$data; },
    'regex' => sub  { @data= ( $data =~ /(.*?\n)/g );}, 
}
);
__END__
Benchmark: timing 100000 iterations of map_split, regex, simple_split.
+..
 map_split:  2 wallclock secs ( 2.04 usr +  0.00 sys =  2.04 CPU) @ 48
+947.63/s (n=100000)
     regex:  2 wallclock secs ( 1.24 usr +  0.00 sys =  1.24 CPU) @ 80
+515.30/s (n=100000)
simple_split:  1 wallclock secs ( 0.75 usr +  0.00 sys =  0.75 CPU) @ 
+132978.72/s (n=100000)
[download]

There is a mistake in the posted code. Your initial string is called $str when you initialise it, but $data when you split it.

Perhaps this typo influenced your results. If I run your code (with an empty $data variable) the regex indeed looks faster, but this is a completely spurious result.

$data = '';
Benchmark: timing 1000000 iterations of map_split, regex, simple_split
+...
 map_split:  0 wallclock secs ( 0.99 usr +  0.00 sys =  0.99 CPU) @ 10
+09081.74/s (n=1000000)
     regex:  1 wallclock secs ( 0.54 usr +  0.00 sys =  0.54 CPU) @ 18
+48428.84/s (n=1000000)
simple_split:  0 wallclock secs ( 0.94 usr +  0.00 sys =  0.94 CPU) @ 
+1062699.26/s (n=1000000)
[download]

Allowing warnings would have caught this mistake!

--
Regards,
Helgi Briem
helgi AT decode DOT is

Comment on Re: regex or split Select or Download Code

Replies are listed 'Best First'.
Re: Re: regex or split by helgi (Hermit) on Feb 06, 2003 at 14:03 UTC
I apologise. I missed the part about wanting to keep the new lines with the array items. -- Regards, Helgi Briem helgi AT decode DOT is	[reply]