Hello,

It should also be noted that it *really* depends upon the size of your data sets. Pike++'s regex solution is efficient for small sets like the one you used in the example, but with a string that has only a thousand newlines it quickly falls behind your split/map solution.

If you are using Perl 5.8.0, it may be worth looking at PerlIO's scalar layer as well. For larger datasets it is very efficient to simply use ye ol' file slurp trick. Due to the overhead of the open() call that is necissary, this won't be the most efficient for smaller datasets.


Here is a bit of code:
#!/usr/bin/perl use warnings; use strict; $|++; use Benchmark qw( cmpthese ); my $str; # short example string $str=" this is a string example"; # longer string ( 1000 lines ) # my @chars = ( 'a' .. 'z', 'A' .. 'Z' ); # for ( 1..1000 ) { # $str .= $chars[ rand @chars ] for 0 .. rand @chars; # $str .= "\n"; # } cmpthese( 5000, { perl_io => sub { open( my $fh, "<:scalar", \$str) or die "$!\n"; my @data = <$fh>; }, split_map => sub { my @data=map { $_.="\n" } split (/\n/, $str); }, regex_pike => sub { my @data = split /(?<=\n)/, $str; }, } );

For the shorter strings, here are the results:
Rate perl_io split_map regex_pike perl_io 14085/s -- -46% -57% split_map 25907/s 84% -- -20% regex_pike 32468/s 131% 25% --

and for the longer strings:
Rate regex_pike split_map perl_io regex_pike 79.4/s -- -40% -56% split_map 131/s 65% -- -27% perl_io 181/s 128% 38% --

  -- dug

In reply to Re: regex or split by dug
in thread regex or split by mce

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.