Finding the positions of a character in a string

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
In this node, I recommended a solution which had a first step of indexing the position of newlines in a string. I believe my overall approach to be sound, but there are many ways the first step could be accomplished. Since the Anonymous Monk appeared to have speed in mind, I decided to Benchmark a few idea.

I briefly discussed this in the CB before deciding to post it here as there was no obvious concensus reached. Keep in mind there will be an unknown number of newlines but the string itself will be no larger than 200K.

Use a regex
Walk the string using successive calls to index
Treat the string as a filehandle and use $/ and tell

Do you have any other ideas? Use the following string construction for the basis of your benchmark.

sub build_str {
    my ($str, $max) = ('', 200 * 1024);
    while ( length($str) < $max ) {
        # Assume lines are between 60-79 characters long
        $str .= ('#' x ((rand 20) + 60)) . "\n";
    }
    return $str;
}
[download]

Cheers - L~R

Comment on Finding the positions of a character in a string Download Code

Replies are listed 'Best First'.
Re: Finding the positions of a character in a string by demerphq (Chancellor) on Nov 28, 2005 at 15:30 UTC
(From the CB where L~R was discussing this) Using index: `perl -le "my $str=qq(\n--\n--\n); my $p=0; do { $p=index($str,qq(\n),$ +p); print $p++; } while $p>0;"` [download] And using s///. `perl -le "my $str=qq(\n--\n--\n); $str=~s/\n/print $-[0]; '\n'/ge;` [download] --- $world=~s/war/peace/g	[reply] [d/l] [select]
Re^2: Finding the positions of a character in a string by sauoq (Abbot) on Nov 28, 2005 at 16:17 UTC
And using s///. Ewww. `print pos($str)-1 while $str =~ /\n/g; print $-[0] while $str =~ /\n/g;` [download] Update: Oops, need to subtract 1 from `pos()` or use `$-[0]` as you did in the s/// version. -sauoq "My two cents aren't worth a dime.";	[reply] [d/l] [select]
Re^3: Finding the positions of a character in a string by demerphq (Chancellor) on Nov 28, 2005 at 16:22 UTC
Acutally I didnt use that deliberately, the reason being that what you wrote is pretty well the same as the index solution, even though it doesnt look it. The idea of the s/// was to avoid coming back to the perl runloop, and instead stay inside of the regex loop. --- $world=~s/war/peace/g	[reply]
Re: Finding the positions of a character in a string by Limbic~Region (Chancellor) on Nov 28, 2005 at 17:35 UTC
All, Here is the benchmark results for all the methods provided so far. Read more... (1374 Bytes) `Rate by_regex1 by_regex3 by_fh by_index by_regex2 by_regex1 49.6/s -- -52% -75% -81% -83% by_regex3 103/s 107% -- -48% -61% -66% by_fh 199/s 301% 94% -- -25% -34% by_index 266/s 437% 160% 34% -- -11% by_regex2 300/s 505% 192% 51% 13% --` [download] Cheers - L~R	[reply] [d/l] [select]
Re^2: Finding the positions of a character in a string by Roy Johnson (Monsignor) on Nov 29, 2005 at 19:40 UTC
I found this to be about 30% faster than `by_index` and something over 10% faster than `by_regex2` `by_index2 => sub { my @offsets; for (my $p = 0; ($p = index($str, "\n", $p)) > 0; push @offset +s, $p++) { } \@offsets; # I made all of them return a ref to the array so I + could check results },` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]