Re: regex or split

Hello,

It should also be noted that it *really* depends upon the size of your data sets. Pike++'s regex solution is efficient for small sets like the one you used in the example, but with a string that has only a thousand newlines it quickly falls behind your split/map solution.

If you are using Perl 5.8.0, it may be worth looking at PerlIO's scalar layer as well. For larger datasets it is very efficient to simply use ye ol' file slurp trick. Due to the overhead of the open() call that is necissary, this won't be the most efficient for smaller datasets.

Here is a bit of code:


#!/usr/bin/perl

use warnings;
use strict;
$|++;

use Benchmark qw( cmpthese );

my $str;

# short example string
$str="
this
is
a string
example";

# longer string ( 1000 lines )
# my @chars = ( 'a' .. 'z', 'A' .. 'Z' );
# for ( 1..1000 ) {
#   $str .= $chars[ rand @chars ] for 0 .. rand @chars;
#   $str .= "\n";
# }


cmpthese( 5000, {

  perl_io    => sub {
                  open( my $fh, "<:scalar", \$str) or die "$!\n";
                  my @data = <$fh>;
                },
  split_map  => sub {
                  my @data=map { $_.="\n" } split (/\n/, $str);
                },
  regex_pike => sub {
                  my @data = split /(?<=\n)/, $str;
                },

} );
[download]

For the shorter strings, here are the results:

              Rate    perl_io  split_map regex_pike
perl_io    14085/s         --       -46%       -57%
split_map  25907/s        84%         --       -20%
regex_pike 32468/s       131%        25%         --
[download]

and for the longer strings:

             Rate regex_pike  split_map    perl_io
regex_pike 79.4/s         --       -40%       -56%
split_map   131/s        65%         --       -27%
perl_io     181/s       128%        38%         --
[download]

-- dug

Comment on Re: regex or split Select or Download Code