in reply to Searching for a word that may only exist in part

i don't see any way to do it besides eating letters from the back, then starting over from the front.

here is one way, though index() would surely be faster than m// here.

my $sequence = "GAATGTTTTAGCAATCTCTTTCTGTCATGAATCCATGGCAGTGACCATACTAAT +GGTGACTGCCATTGATGGAGGGAGACACA"; my $find = "CTGGATAAGAATGTTTTAGCAATCTCTT"; my $found; MATCH: { my $tail = $find; while ( length($tail) > 2 and not $found ) { ($found) = $sequence =~ /($tail)/ # find match or substr( $tail, 0, 1, ''); # or eat first letter } last MATCH if $found; my $head = $find; ## can chop first since exact match already failed while ( chop $head and length($head) > 2 and not $found ) { ($found) = $sequence =~ /($head)/; } } print "found? $found\n";
updated: to provide better(?) var names

Replies are listed 'Best First'.
Re^2: Searching for a word that may only exist in part
by GrandFather (Saint) on Oct 19, 2006 at 01:41 UTC

    For:

    my $sequence = "111...1111...11"; my $find = "11111";

    Prints:

    found? 1111

    whereas the OP says "if I do not find the whole word within a sequence and start truncating the word, then it can only match at either end of a sequence and not within".


    DWIM is Perl's answer to Gödel
      oh, i misunderstood, thinking only at the beginning or end of the 'find' sequence (1111 is the beginning (..and end..) of 11111).

      the regexes are fixable (/(^$tail|$tail$)/ and same for $head?) easily enough.. i just wanted an excuse to use chop ;-)