in reply to Re: Re: Progressive pattern matching
in thread Progressive pattern matching
To clarify what I think that you want, let me construct some examples :
Input string :
GATTACA
File :
ATTACGATTACAAA
GATT
ZZGATTZZ
asdghckasdlkj
TTACA
Output :
On line 1 : GATTACA,GATTAC,ATTACA,TTACA,GATTA,ATTAC,TTAC,TACA, GATT,ATTA,TTA,TAC,GAT,ATT,ACA,TT,TA,GA,CA,AT,AC,T,G,C,A On line 2 : GATT,GAT,ATT,TT,GA,AT,T,G,A On line 3 : GATTA,GATT,ATTA,TTA,GAT,ATT,TT,TA,GA,AT,T,G,A On line 4 : GATTACA,GATTAC,ATTACA,TTACA,GATTA,ATTAC,TTAC,TACA, GATT,ATTA,TTA,TAC, GAT,ATT,ACA,TT,TA,GA,CA,AT,AC,T,G,C,A
To achieve this, you want to find the longest substring of the input string that is found on a line of the file, for the various substrings that match until the end of the last character of the search string. To show you a first approach which is surely suboptimal, look at the following code which tries a brute force approach :
use strict; my $searchString = "GATTACA"; my %subStrings = {}; my @subStrings = (); sub populate { # Fills the hash subStrings with all "allowed" substrings # of the argument. Duplicates are avoided by # filling a hash instead of an array. my ($string) = @_; return if $string eq ""; my $line = ""; foreach (split "", $string) { $line .= $_; #print "Added $line\n"; $subStrings{$line} = "1"; }; populate( substr( $string, 1 )); }; populate( $searchString ); # We are only interested in the keys of our hash, # longest matches first : @subStrings = reverse sort { length($a) <=> length($b) # Sort by string length || $a cmp $b # and then by string content } keys %subStrings; # We read the file line by line : my ( $line, $substring ); while ($line = <DATA>) { my @MatchedSubstrings = (); foreach $substring (@subStrings) { if ($line =~ /$substring/) { push @MatchedSubstrings, $substring; }; }; if ($#MatchedSubstrings != -1) { print "On line $. : ", join(",", @MatchedSubstrings ),"\n"; }; }; __DATA__ AGATTACAAA ZZGATTZZ GATTAZZ GATGATTACAZZ asdfgh gattaca
Note that there already are many Perl modules for Bioinformatics, a search of the CPAN (http://www.cpan.org) should give you interesting results, as should a Google search for Perl and DNA I guess.
perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: Re: Progressive pattern matching
by tfrayner (Curate) on Oct 15, 2001 at 19:35 UTC |