Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
In this node, I recommended a solution which had a first step of indexing the position of newlines in a string. I believe my overall approach to be sound, but there are many ways the first step could be accomplished. Since the Anonymous Monk appeared to have speed in mind, I decided to Benchmark a few idea.

I briefly discussed this in the CB before deciding to post it here as there was no obvious concensus reached. Keep in mind there will be an unknown number of newlines but the string itself will be no larger than 200K.

Do you have any other ideas? Use the following string construction for the basis of your benchmark.
sub build_str { my ($str, $max) = ('', 200 * 1024); while ( length($str) < $max ) { # Assume lines are between 60-79 characters long $str .= ('#' x ((rand 20) + 60)) . "\n"; } return $str; }

Cheers - L~R

Replies are listed 'Best First'.
Re: Finding the positions of a character in a string
by demerphq (Chancellor) on Nov 28, 2005 at 15:30 UTC

    (From the CB where L~R was discussing this) Using index:

    perl -le "my $str=qq(\n--\n--\n); my $p=0; do { $p=index($str,qq(\n),$ +p); print $p++; } while $p>0;"

    And using s///.

    perl -le "my $str=qq(\n--\n--\n); $str=~s/\n/print $-[0]; '\n'/ge;
    ---
    $world=~s/war/peace/g

      And using s///.

      Ewww.

      print pos($str)-1 while $str =~ /\n/g; print $-[0] while $str =~ /\n/g;

      Update: Oops, need to subtract 1 from pos() or use $-[0] as you did in the s/// version.

      -sauoq
      "My two cents aren't worth a dime.";
      

        Acutally I didnt use that deliberately, the reason being that what you wrote is pretty well the same as the index solution, even though it doesnt look it. The idea of the s/// was to avoid coming back to the perl runloop, and instead stay inside of the regex loop.

        ---
        $world=~s/war/peace/g

Re: Finding the positions of a character in a string
by Limbic~Region (Chancellor) on Nov 28, 2005 at 17:35 UTC
    All,
    Here is the benchmark results for all the methods provided so far.
    Rate by_regex1 by_regex3 by_fh by_index by_regex2 by_regex1 49.6/s -- -52% -75% -81% -83% by_regex3 103/s 107% -- -48% -61% -66% by_fh 199/s 301% 94% -- -25% -34% by_index 266/s 437% 160% 34% -- -11% by_regex2 300/s 505% 192% 51% 13% --

    Cheers - L~R

      I found this to be about 30% faster than by_index and something over 10% faster than by_regex2
      by_index2 => sub { my @offsets; for (my $p = 0; ($p = index($str, "\n", $p)) > 0; push @offset +s, $p++) { } \@offsets; # I made all of them return a ref to the array so I + could check results },

      Caution: Contents may have been coded under pressure.