Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!
I didn't know a good way of naming my question, so I am getting straight to the point:
I have a string like the following:
-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----

Is there any way I can gather and group my information and get, for example:
6-10:M 17-21:I 26-28:M 32-35:O 39:I 43-46:M
Thank you in advance.

Replies are listed 'Best First'.
Re: group things?
by Joost (Canon) on Apr 25, 2008 at 22:26 UTC
    Please correct me if my assumptions are wrong: Basically, what you want is to find the start and end position of subsequences of same characters that aren't "-".

    A straightforward way to do that is to inspect each character in sequence and compare it to the previous one. But that would be slow and annoying. It's easier to use a regex:

    #!/usr/bin/perl -w use strict; $_ = "-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----"; while (/(\w)(\1*)/g) { print "$1:".(1+pos()-length "$1$2")."-".pos."\n"; }
      Joost, your code works perfect for me... Unfortunately, I don't seem to understand much of what you do... but thank you for your time...
        As someone else already hinted, you would probably benefit from reading perlretut and pos.

        It may seem like overkill for this problem but learning regular expressions will make string processing problems like this a lot easier. And almost all serious programming languages today have a regular expression library very much like perl's, so chances are high you'd be able to use that knowledge if you're using some other language too.

        OK, Joost's code works, now it's time for you to go to work.

        The code uses only a regex and two built-in functions, pos and length, and string concatenation (the . operator). Time to apply Your Mother's suggestions about documentation access and do some reading. Then, maybe try some variations and see what happens. After that, specific questions will usually elicit pertinent answers.

Re: group things?
by pc88mxer (Vicar) on Apr 25, 2008 at 22:42 UTC
    This is related to run length encoding. A regular expression to perform RLE is (found at http://hpaste.org/1987):
    sub encode { (my $string = shift) =~s/((.)\2*)/$2 . (length $1) /eg; $string; }
    Maybe you can modify this function to capture the beginning and ending positions of each run.

    Update: Spoiler follows...

Re: group things?
by jwkrahn (Abbot) on Apr 25, 2008 at 22:48 UTC
    $ perl -le' my $x = q[-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----]; while ( $x =~ /([^-])/g ) { my $y = $1; pos( $x )--; $x =~ /\G$y+/g and print $-[0] + 1, $-[0] + 1 == $+[0] ? "" : "-$+ +[0]", ":$y"; } ' 6-10:M 17-21:I 26-28:M 32-35:O 39:I 43-46:M
Re: group things?
by Anonymous Monk on Apr 25, 2008 at 22:25 UTC
    I know how to split the string and read it letter by letter, doing something like:
    $str="-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----"; @my_splitted_str=split("",$str);
    My problem is how will Perl "understand" that, for instance letters 6-10 are M and print 6-10:M and not 6:M, 7:M, 8:M, 9:M, 10:M?
      Joost's given you a solution using regexps. Here's one using split:
      my $str = '-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----'; my ( $start, $end ); for ( split /(-+)/, $str ) { my $char = substr $_, 0, 1; next unless length; # WTF does split return an empty string at the + start if ( $char ne '-' ) { print $start+1, length > 1 ? ( "-", $start + length ) : "", ": $char\n" } $start += length; }

      Output:

      6-10: M 17-21: I 26-28: M 32-35: O 39: I 43-46: M


      Unless I state otherwise, my code all runs with strict and warnings
      One way:
      use strict; use warnings; my $str="-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----"; my @my_splitted_str=split(//,$str); # add a sentinel value after the other characters, so # the end of the last range is properly detected push @my_splitted_str, "-"; my $range = ""; # we are not initially in a range my $start_of_range; for my $index ( 0 .. $#my_splitted_str ) { # if we are in a range, see if it ends now if ( $range && $range ne $my_splitted_str[$index] ) { # the previous character was the last of this range my $end_of_range = $index - 1; if ($end_of_range == $start_of_range) { print $start_of_range + 1, ":", $range, "\n"; } else { print $start_of_range + 1, "-", $end_of_range + 1, ":", $r +ange, "\n"; } $range = ""; } # see if this is the start of a range if (! $range && $my_splitted_str[$index] ne "-") { $start_of_range = $index; $range = $my_splitted_str[$index]; } }
Re: group things?
by Your Mother (Archbishop) on Apr 25, 2008 at 22:04 UTC

    Ah, so much homework, so little time.

      I am trying to learn so If you could give me some advice I would greatly appreciate it. I don't understand why you have to make fun of me...

        Oh, I was just horsing around. Have you tried writing any code yet? Try looking through some of the POD for Perl functions like substr, index, split, pos, and the page for regexes perlre. Like so-

        # update, adding one with an -f flag for clarity perldoc -f pos perldoc perlre

        See if you can at least parse the string out a little in any way. If you're stuck. Show the code you tried. A little effort is usually rewarded generously here. A question without it tends to get less attention. And as learning experiences go, someone showing you something is drastically less effective than trying to do it, even when having trouble. You'll be much more receptive to answers after banging your head on a problem a little. We all are.