comment on

Hello,

It should also be noted that it *really* depends upon the size of your data sets. Pike++'s regex solution is efficient for small sets like the one you used in the example, but with a string that has only a thousand newlines it quickly falls behind your split/map solution.

If you are using Perl 5.8.0, it may be worth looking at PerlIO's scalar layer as well. For larger datasets it is very efficient to simply use ye ol' file slurp trick. Due to the overhead of the open() call that is necissary, this won't be the most efficient for smaller datasets.

Here is a bit of code:


#!/usr/bin/perl

use warnings;
use strict;
$|++;

use Benchmark qw( cmpthese );

my $str;

# short example string
$str="
this
is
a string
example";

# longer string ( 1000 lines )
# my @chars = ( 'a' .. 'z', 'A' .. 'Z' );
# for ( 1..1000 ) {
#   $str .= $chars[ rand @chars ] for 0 .. rand @chars;
#   $str .= "\n";
# }


cmpthese( 5000, {

  perl_io    => sub {
                  open( my $fh, "<:scalar", \$str) or die "$!\n";
                  my @data = <$fh>;
                },
  split_map  => sub {
                  my @data=map { $_.="\n" } split (/\n/, $str);
                },
  regex_pike => sub {
                  my @data = split /(?<=\n)/, $str;
                },

} );
[download]

For the shorter strings, here are the results:

              Rate    perl_io  split_map regex_pike
perl_io    14085/s         --       -46%       -57%
split_map  25907/s        84%         --       -20%
regex_pike 32468/s       131%        25%         --
[download]

and for the longer strings:

             Rate regex_pike  split_map    perl_io
regex_pike 79.4/s         --       -40%       -56%
split_map   131/s        65%         --       -27%
perl_io     181/s       128%        38%         --
[download]

-- dug

In reply to Re: regex or split by dug
in thread regex or split by mce

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.