in reply to Re: How to parse string to substrings based on character occurence in the string
in thread How to parse string to substrings based on character occurence in the string

let me clarify my input and output that I want

input- a sequence

.........RKRMMWW*VWMWRYHDWMH*HR*DRMDMWHMWYVMWVRWMVBHWKVYWSMHYWY*HWVMVS +KDHMDBYKMWRSMDSD*...**Y*WD*VWDRYHHYRYKRWWDDKDDH*DV**HYW*RW*WMYMRV*BWB +WWDMVSYWDBDWWYSMKW*YRVWVYYRMV*KRK*WWDMRMWR*KR**YWHHWH...DYD*MWKKKKWS

here are the potential subsequences that I am looking for from this sequence:

Output in the order of the occurrence in the sequence (from left to right)

1. *VWMWRYHDWMH*HR* (3 *, 16 length) 2. *HWVMVSKDHMDBYKMWRSMDSD* (2 *, 24 length) 3. *...**Y*WD* (5 *, 11 length) 4. *HDV**HYW*RW*WMYMRV* (6 *, 20 length) 5. *YRVWVYYRMV*KRK*WWDMRMWR* (4*, 25 length) 6. *KRK*WWDMRMWR*KR** (5 *, 18 length) 7. R*KR**YWHHWH...DYD* (4*, 19 length)

does this make sense?

I do not want the subsequences to overlap necessarily. The importance is more on the number of * than the length. For example, subseq with 6* and 10 length is as good as subseq with 6* and 20 length. So as shorter/longer the subseq could be with max * is accepted (though in the range 10-25)

Replies are listed 'Best First'.
Re^3: How to parse string to substrings based on character occurence in the string
by BrowserUk (Patriarch) on Mar 09, 2010 at 21:59 UTC
    does this make sense?

    Not completely.

    1. *HDV**HYW*RW*WMYMRV* doesn't appear in your input.

      Did you mean *DV**HYW*RW*WMYMRV* (6*, 19) or H*DV**HYW*RW*WMYMRV* (6*,20)?

      And if the latter, why?

    2. Why R*KR**YWHHWH...DYD* (4*, 19 length) instead of *KR**YWHHWH...DYD* (4*, 18 length)

    Maybe this is something like your goal?

    #! perl -slw use strict; my $seq = 'RKRMMWW*VWMWRYHDWMH*HR*DRMDMWHMWYVMWVRWMVBHWKVYWSMHYWY*HWVM +VSKD' . 'HMDBYKMWRSMDSD*...**Y*WD*VWDRYHHYRYKRWWDDKDDH*DV**HYW*RW*WMYMRV +*BWB'; my %uniq; substr( $seq, $_, 25 ) =~ m[(\*.{8,23}\*)] and ++$uniq{ $1 } == 1 and print "'$1'" for 0 .. length( $seq )-1; __END__ C:\test>827470 '*VWMWRYHDWMH*HR*' '*HWVMVSKDHMDBYKMWRSMDSD*' '*...**Y*WD*' '*WD*VWDRYHHYRYKRWWDDKDDH*' '*VWDRYHHYRYKRWWDDKDDH*' '*VWDRYHHYRYKRWWDDKDDH*DV*' '*DV**HYW*RW*' '*DV**HYW*RW*WMYMRV*' '**HYW*RW*WMYMRV*' '*HYW*RW*WMYMRV*' '*RW*WMYMRV*'

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.