in reply to Re: substring selection from a string on certain qualifying conditions
in thread substring selection from a string on certain qualifying conditions

If this is the input string: AXXAXXAXXXXXXXXXXXXXXAXXA

Output should be:

AXXAXXA and

AXXA (AXXA from the end of the string)

  • Comment on Re^2: substring selection from a string on certain qualifying conditions

Replies are listed 'Best First'.
Re^3: substring selection from a string on certain qualifying conditions
by BrowserUk (Patriarch) on Dec 09, 2010 at 03:15 UTC

    Any more unstated rules? :)

    C:\test>876075 AGRTGAXWXX : [ AGRTGA ] ACRMGAHKMAHGTXX : [ ACRMGAHKMA, GAHKMAHGT ] AXXAXXAXXXXXXXXXXXXXXAXXA : [ AXXAXXA, AXXA ]
    #! perl -slw use strict; use Data::Dump qw[ pp ]; sub maxMatches { my $s = shift; my @matches; my $vec = ''; for my $o ( 0 .. length( $s ) - 10 ) { my( $match ) = $s =~ m[.{$o}([ACGT].{0,8}[ACGT])] or next; my $mask = ''; vec( $mask , $_, 1 ) = 1 for $-[1] .. $+[1]-1; next if ( $vec | $mask ) eq $vec; $vec |= $mask; push @matches, $match; } return @matches; } while( <DATA> ) { chomp; printf "$_ : [ %s ]\n", join ', ', maxMatches( $_ ); } __DATA__ AGRTGAXWXX ACRMGAHKMAHGTXX AXXAXXAXXXXXXXXXXXXXXAXXA

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Your addition of unrequested functionality — removal of sequences found in other sequences in the string — is what prompted my question. There was no unstated rule. Your implication that the OP did something was uncalled for and wrong.

        Update: The post above, in its entirety, originally read:

        Your addition of unrequested functionality is what prompted my question. There was no unstated rule.

        And was silently modified after the fact, without notification, in typically unhanded, duplicitous, and utterly dishonourable fashion. Presumably an attempt to try and save face.


        Sorry, but the OPs own code would remove all duplicates sequences found, regardless of where they were found.

        my %uniq=(); my $string = 'ACRMGAHKMAHGTXX'; substr($string, $_, 10 ) =~ m[([AGTC].{0,8}[AGTC])] and ++$uniq{ $1 } for 0 .. length( $string )-1; for my $key (keys %uniq){ print $key, "\n"; }

        In the absence of any specific discussion, the OPs code is the spec. You opened that discussion, and I up-voted you for doing so, but there is no mention of that requirement in the OPs post. Neither in the stated "conditions", nor the worked examples.

        A requirement, not discussed is "unstated".


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.