Using index would look something like the following:

sub using_index { our $seq; *seq = \$_[0]; my @groups; my $pos = -1; my $start = -1; for (;;) { my $new_pos = index($seq, 'M', $pos+1); if ($new_pos < 0) { if (defined($start)) { push(@groups, [ $start, $pos ]); } last; } if ($start < 0) { $start = $new_pos; } elsif ($new_pos - $pos > 1) { push(@groups, [ $start, $pos ]); $start = $new_pos; } $pos = $new_pos; } return @groups; }

It would be simpler if there was a function that returned the next character which isn't 'M'.

As you can guess, it's much slower than the regexp approach. The regexp approach is 170% faster than (i.e. 2.7 times the speed of) the index method on the input you provided.

Benchmark code:

use strict; use warnings; use Benchmark qw( cmpthese ); sub using_index { our $seq; *seq = \$_[0]; my @groups; my $pos = -1; my $start = -1; for (;;) { my $new_pos = index($seq, 'M', $pos+1); if ($new_pos < 0) { if (defined($start)) { push(@groups, [ $start, $pos ]); } last; } if ($start < 0) { $start = $new_pos; } elsif ($new_pos - $pos > 1) { push(@groups, [ $start, $pos ]); $start = $new_pos; } $pos = $new_pos; } return @groups; } sub using_regexp { our $seq; *seq = \$_[0]; my @groups; push(@groups, [ $-[0], $+[0]-1 ]) while $seq =~ /M+/g; return @groups; } { my $seq = "IIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIIIMMMMMM +MMMOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOMMMMMMMMMM +MIIIIIIMMMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMMMMMMMOOOOOOO +OOOOOOOOOOOOOOOOOOOOMMMMMMMIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOMMMMMM +MIIIMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOO +OOOOOOOOOMMMMMMMMI"; print("using_index\n"); print("-----------\n"); printf("%d to %d\n", @$_) foreach using_index($seq); print("\n"); print("using_regexp\n"); print("------------\n"); printf("%d to %d\n", @$_) foreach using_regexp($seq); print("\n"); cmpthese(-3, { using_index => sub { my @groups = using_index $seq; 1; }, using_regexp => sub { my @groups = using_regexp $seq; 1; }, }); }

Benchmark results:

Rate using_index using_regexp using_index 2039/s -- -63% using_regexp 5467/s 168% --

In reply to Re^2: 'grouping' substrings? by ikegami
in thread 'grouping' substrings? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.