#! perl -slw use strict; my $needle = 'ATGGAGTCGACGAATTTGAAGAAT'; my $haystack = 'xxxxxxATGGAGyxxxTCGAzxxxxCGAATTTGAAxxwGAAT'; my @needles; #! While we've still needle to process while ($needle) { #! Try shorter and shorter substring of needle for my $start (0 .. length($needle) - 1) { my $bit = substr $needle, $start; #! move of if no match next unless $haystack =~ m[$bit]; #! Got a match, save it $haystack =~ s[$bit][ push @needles, $bit; $bit ]ge; #! and remove it $needle = substr $needle, 0, $start; #! repeat last; } } #! Sort the needles longest first @needles = sort{ length $b <=> length $a } @needles; #! mark their places in the haystack $haystack =~ s[($_)(?!\})][{$1}]g for @needles; #! remove nested marks #! (where a shorter needle was found inside a longer one) $haystack =~ s[ ({[^{}]*?) #! capture everything after the first { ({) #! until we find a second (also captured) ([^{}]*?) #! Then capture everything before the close } (}) #! and the close (nested) } ([^}]*?}) #! and everything from there, to and including the final } ] [$1$3$5]gx; #! Throw away the inner {}. #! Finally print out the none needle parts, and the needle that followed them. print "$1 preceeded $2" while $haystack =~ m[(\G[^{]+){([^}]+)}]g; __END__ C:\test>228122 xxxxxx preceeded ATGGAG yxxx preceeded TCGA zxxxx preceeded CGAATTTGAA xxw preceeded GAAT C:\test>