Guidance needed in perl program for fast pattern matching algorithm.

tweety has asked for the wisdom of the Perl Monks concerning the following question:

The link given below is the algorithm which i am trying to implement in perl programming for pattern matching. since the brute force algorithm takes a really long time to pattern match with the query string, i want to implement this fast pattern matching algorithm.Here is the algorithm, http://pubs.acs.org/cgi-bin/article.cgi/jcisd8/2004/44/i04/html/ci030463z.html Here is the perl program which i have done so far,


#!/usr/bin/perl
$sequence = "BIOINFOCUMBIOINFOBIO";
$seq = "INFO";
print "pattern name $seq\n";
%sn =();
@seqss = split ('',$seq);
$m=scalar @seqss;

for($i=0;$i<$#seqss;$i++)
{
$firstch=$seqss[0];
$lastch=$seqss[$m-1];
}
print "\nfirst letter  : $firstch\n";
print "last letter   : $lastch\n";

my @unique = ();
my %seen   = ();
@pats=reverse @seqss;
foreach my $elem ( @pats )
{
next if $seen{ $elem }++;
push @uni, $elem;
@unique= reverse @uni;
}
$cc=1;
foreach (@uni)
{
    $count{$_} = $cc;    
    $cc++;
}
$zen = $m+1;
for(my $i=0;$i<scalar @unique;$i++)
{
$mcut=scalar @unique;
$m=$mcut-$i;
%num=("$unique[$i]"=>"$m");
while (($key1, $val1) = each(%num))
{
push(@key,$key1);
push(@val,$val1);
}
}
print "key of pattern is @key\n";
print "val of pattern is @val\n";
@onlyseq = split ('',$sequence);
foreach (@onlyseq)
{
    if($count{$_} != $zen)
    {
        if (defined $count{$_})
        {
        push(@a,$count{$_});
        push(@b,$_);
        }
        else
        {
        push(@a,$zen);
        push(@b,$_);
        }
    }
    else
    {
        #print "$cc \t$_\n";
    }
}
print "this is array a : @a\n";
print "this is array b : @b\n\n";
#=========================================
$m=scalar @seqss;
for(my $i=$m;$i<scalar @a;$i++)
{
    
    if(($lastch eq $b[$i]) && ($firstch eq $b[$i-($m-1)]))
    {
        for(my $j=($i-($m-1));$j<=$i;$j++)
        {
            push(@fnum,$a[$j]);
            push(@flet,$b[$j]);
        }
    }
}
print "@flet\n";
print "@fnum\n\n";
[download]

Am struggling in the last search phase of the algorithm, can anyone plz suggest the search step to conclude the program as per the algorithm? Please pardon me if this is not the way to question in a forum or write a coding or watever. I am a biologist and i am new to all this including programing. thanks.

Comment on Guidance needed in perl program for fast pattern matching algorithm. Download Code

Replies are listed 'Best First'.
Re: Guidance needed in perl program for fast pattern matching algorithm. by Hofmator (Curate) on Oct 10, 2006 at 12:06 UTC
A couple of random remarks ... The article you linked to is published in the Journal of Chemical Information and Modeling. Only very few of us will probably have online access to this ACS publication, so refering to the algorithm described there is difficult to impossible. If you are able to describe what the alogrithm is supposed to accomplish, maybe somebody can point you to an existing implementation of this alogrithm. We have a number of bioinfo-savvy monks here and there are a number of modules available on CPAN. Indenting your code properly helps with reading and understaning it, take a look at perltidy. -- Hofmator Code written by Hofmator and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.	[reply]
Re: Guidance needed in perl program for fast pattern matching algorithm. by BrowserUk (Patriarch) on Oct 10, 2006 at 21:39 UTC
Are you aware that Perl has the Boyer-Moore algorithm built-in to both index and regex searching? As it is implemented in C, it will run much faster than any similar algorithm (Horspool is a simplification of Boyer-Moore) implemented in Perl. You might consider reading about study in docs and using it in conjunction with your Perl pattern matching code and see if it improves the performance any. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Guidance needed in perl program for fast pattern matching algorithm. by runrig (Abbot) on Oct 10, 2006 at 18:14 UTC
Just a few comments (since I don't really know what algorithm you are trying to implement). First consider using using strict and warnings. And next, on this line: `for(my $i=$m;$i<scalar @a;$i++)` [download] A more "perlish" way of doing that sort of thing is: `for my $i ($m..$#a)` [download] And this: `while (($key1, $val1) = each(%num)) { push(@key,$key1); push(@val,$val1); } }` [download] Is easier done like this (and they will be in the correct corresponding order): `my @key = keys %num; my @val = values %num;` [download]	[reply] [d/l] [select]
something similar to horsepool algorithm or raita algorithm by tweety (Initiate) on Oct 10, 2006 at 18:33 UTC
am try to implement something very close to or something similar to horsepool algorithm or raita algorithm.	[reply]
Re: Guidance needed in perl program for fast pattern matching algorithm. by GrandFather (Saint) on Oct 10, 2006 at 20:04 UTC
There is a WikiPedia article describing the algorithm with C source: Boyer-Moore-Horspool_algorithm which I presume is the algorithm referred to. DWIM is Perl's answer to Gödel	[reply]