in reply to regex or split

Hello,

It should also be noted that it *really* depends upon the size of your data sets. Pike++'s regex solution is efficient for small sets like the one you used in the example, but with a string that has only a thousand newlines it quickly falls behind your split/map solution.

If you are using Perl 5.8.0, it may be worth looking at PerlIO's scalar layer as well. For larger datasets it is very efficient to simply use ye ol' file slurp trick. Due to the overhead of the open() call that is necissary, this won't be the most efficient for smaller datasets.


Here is a bit of code:
#!/usr/bin/perl use warnings; use strict; $|++; use Benchmark qw( cmpthese ); my $str; # short example string $str=" this is a string example"; # longer string ( 1000 lines ) # my @chars = ( 'a' .. 'z', 'A' .. 'Z' ); # for ( 1..1000 ) { # $str .= $chars[ rand @chars ] for 0 .. rand @chars; # $str .= "\n"; # } cmpthese( 5000, { perl_io => sub { open( my $fh, "<:scalar", \$str) or die "$!\n"; my @data = <$fh>; }, split_map => sub { my @data=map { $_.="\n" } split (/\n/, $str); }, regex_pike => sub { my @data = split /(?<=\n)/, $str; }, } );

For the shorter strings, here are the results:
Rate perl_io split_map regex_pike perl_io 14085/s -- -46% -57% split_map 25907/s 84% -- -20% regex_pike 32468/s 131% 25% --

and for the longer strings:
Rate regex_pike split_map perl_io regex_pike 79.4/s -- -40% -56% split_map 131/s 65% -- -27% perl_io 181/s 128% 38% --

  -- dug