in reply to Re: Pattern matching simultaneous substitution
in thread Pattern matching simultaneous substitution

Sorry, I missed that :)
- if no D or U are present, then all s becomes i
The following one was not properly converted:
old:ssssssDDDDDDDDDDDDDDsssssssssssssDDDDDssssssssssssssssssssssssssss +ssssssssssssssDDDDDDDssssssssssssssssssssssssssssssssssssssssssssssss +sssssssssssssDDDDDDssssssssssssssssssssssssssssssssssssssssssssssssss +sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss +ssssssssssss new:OOOOOOMMMMMMMMMMMMMMIIIIIIIIIIIIIMMMMMOOOOOOOOOOOOOOOOOOOOOOOOOOOO +OOOOOOOOOOOOOOMMMMMMMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII +IIIIIIIIIIIIIMMMMMMssssssssssssssssssssssssssssssssssssssssssssssssss +sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss +ssssssssssss

Replies are listed 'Best First'.
Re^3: Pattern matching simultaneous substitution
by choroba (Cardinal) on Jan 05, 2022 at 20:54 UTC
    I still don't understand the rules. If the input is
    DDssDD
    what should it become? Are the "s" prior to "D" or following a "D"?

    In the new/old example, why are the final "s" not replaced? Don't they follow a "D"?

    Please, try to be more precise.

    Also, you can easily shorten the data, 2 consecutive characters of each type would do.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Ok, let me try again with some examples of how data look now (old) and should look (new):
      -case 1: only 's'
      old:sssss new:iiiii

      -case 2: only 's' and 'U'
      old:sssUUss new:iiiMMoo

      -case 3: only 's' and 'D'
      old:sssDDss new:oooMMii

      -case 4: 's', 'D' and 'U' (all possible characters)
      old:sssssDDDDDDDssssUUUss new:oooooMMMMMMMiiiiMMMoo

      OR
      old:sssssUUUUUUUssssDDDss new:iiiiiMMMMMMMooooMMMii

      For some reason, the code I wrote changes many but not all cases... Not that you cannot see s+D+s+D+s+<code> sequence, or <code>s+U+s+U+s+<code>. <code>D and U alternate.
        > Not that you cannot see s+D+s+D+s+ sequence, or s+U+s+U+s+. D and U alternate.

        If you mean "Note" by "Not", than your sample input is invalid, as both the sequences contain sDsDs.

        Also, I think the following works correctly even for the non-alternating sequences in the way you originally showed:

        my %before = (U => 'I', D => 'O'); my %after; @after{ keys %before } = reverse values %before; sub subst { local ($_) = @_; 1 while s/(s+)(?=([DU]))/$before{$2} x length $1/e | s/(?<=([DU]))(s+)/$after{$1} x length $2/e; tr/DU/MM/ or tr/s/I/; return $_ }
        It uses the bitwise or that doesn't short circuit to try both the substitutions every time while there's anything to replace.

        Tested against:

        Might need some tweaking if the specification changes.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        Fixed it after all:
        while(<>) { if($_=~/^>/) { $id=$_; $seq=<>; print $id.$seq; if($seq=~/^s+$/) { $seq=~s/s/I/g; print $seq; } else { while($seq=~/(s+)([U|D]+)(s+)/g) { $part_before=$1; $len_bef=length($part_before); $part_TM=$2; $part_after=$3; $len_after=length($part_after); if($part_TM=~/U/) { $part_bef_new='I' x $len_bef; $part_after_new = 'O' x $len_after; } elsif($part_TM=~/D/) { $part_bef_new='O' x $len_bef; $part_after_new = 'I' x $len_after; } $seq=~s/$part_before/$part_bef_new/; $seq=~s/$part_after/$part_after_new/; } $seq=~s/U/M/g; $seq=~s/D/M/g; print $seq; } } }

        Thank you for your time!