Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
I have a string of characters that I want to format into a more "compact" size. I will use it as input for a string.
The initial format is:
iiMMMMMMMMMMoooooooooooooooooooooooMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiii +iiiiiMMMMMMMMMMMMMooooooooooooooooooooo

and I want to transform it to:
i3-12o36-45i76-88o
My workaround through it would be first to get the substring of the string to see what the initial character is and then somehow use maybe index function to get the limits of the MMMMM parts (there can be more than one, it is not always 2). The only info we have is that the string may start with i or o and that i alternates to o, when there exist 'MMMM' parts between them.
Any help?

Replies are listed 'Best First'.
Re: How to format such a string?
by rjt (Curate) on Jun 16, 2013 at 02:09 UTC

    Given your input constraints, this regexp gives the desired result:

        s/(.)\1*/$1 eq 'M' ? 1 + pos . '-' : (pos || '') . $1/eg;

    I don't believe your input strings are allowed to end with 'M', but if they are, you can use the slightly longer:

        s/((.)\2*)/$2 eq 'M' ? 1 + pos . '-' . (length($1) + pos) : $2/eg;

    Edit (and N.B.): The above are complete statements that modify $_, not RHS fragments for use with =~. Refer to the very first line of pos for the reason behind this restriction:

    Returns the offset of where the last m//g search left off for the variable in question ($_ is used when the variable is not specified).

    To use these substitutions on a variable besides $_ (as in $string =~ s/.../.../eg; ), replace occurrences of pos with pos($string), or use the slightly more line-noisy @+/@- (@LAST_MATCH_(END|START)) available since Perl 5.6:

        $s =~ s/(.)\1*/$1 eq 'M' ? $+[1] . '-' : ($-[1] || '') . $1/eg;
Re: How to format such a string?
by RichardK (Parson) on Jun 16, 2013 at 00:34 UTC

    What are your rules to transform the input string to the output string?

    I don't see any obvious connection, so can't help much.

    Once you've sorted out the rules then I'd use split,length & join

      It is actually the boundaries of the small substrings... So, in my example, you have: i3-12o36-45i76-88o, which means that the "original" string starts with i, then from 3-12 we have M, then we have o, then from 36-45 we have M again, then i etc...

        Ok -- is your output format fixed? I think run length encoding could be more compact and easier to work with, but maybe that's just me.

        So your string would rle as 2i10M23o...

Re: How to format such a string?
by Anonymous Monk on Jun 16, 2013 at 05:51 UTC
    $ perl -le' my $string = "iiMMMMMMMMMMoooooooooooooooooooooooMMMMMMMMMMiiiiiiiiiii +iiiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMooooooooooooooooooooo"; my $code; $code .= $1 eq "M" ? $-[0] + 1 . "-$+[0]" : $1 while $string =~ /(.)\1 +*/g; print $code; ' i3-12o36-45i76-88o
Re: How to format such a string?
by AnomalousMonk (Archbishop) on Jun 16, 2013 at 17:30 UTC

    If your encoding format is still flexible, I would suggest a 'classic' (and preferable, IMHO) Run Length Encoding approach (Update: as already suggested, I now see, by RichardK here). (Caution: not tested for runs of 256 or more characters.)

    >perl -wMstrict -le "use constant S => 'iiMMMMMMMMMMoooooooooooooooooooooooMMMMMMMMMMiiiii +iiiiiiiiiiiiiiiiiiiiiiiiiMMMMM MMMMMMMMooooooooooooooooooooo'; print q{'}, S, q{'}; ;; (my $rle = S) =~ s{ ((.) \2{0,254}) }{ chr(length $1) . $2 }xmsge; print qq{'$rle'}; ;; (my $s = $rle) =~ s{ (.) (.) }{ $2 x ord $1 }xmsge; print qq{'$s'}; ;; die 'encode/decocde failed' if $s ne S; print 'encode/decode ok!'; " 'iiMMMMMMMMMMoooooooooooooooooooooooMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiii +iiiiiiMMMMMMMMMMMMMooooooooooo oooooooooo' '?i M?o M§o' 'iiMMMMMMMMMMoooooooooooooooooooooooMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiii +iiiiiiMMMMMMMMMMMMMooooooooooo oooooooooo' encode/decode ok!

    (You may be able to do something a bit more efficient using unpack for the decoding step. I think  s/// is about as efficient as you will get for encoding.)

Re: How to format such a string?
by hdb (Monsignor) on Jun 17, 2013 at 07:47 UTC
    s/(([io])+M)M*([io]*)(?=[io])/"$2$+[1]-$-[3]"/ge;