newbio has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to do a pattern matching exercise. What I am trying to do:
Pick words from a file ($wordfile - each line contains a main term and its synonyms separated by a space) and see if those words (or their synonyms) appear in sentences of another file ($textfile - each line contains textID and associated sentences).
Below is my code.
1. I am getting the following error. When I comment "use warning" in the header this error goes off. What is this error and how to remove it?
2. Also, I wanted to find out how I can improve upon my program so that it runs faster. Is there a way to avoid looping over each key of hash %{$List1Ref}, in, foreach my $p (sort keys (%{$List1Ref})). Any other faster way?main::ReadDataInHash() called too early to check prototype at D:\wordm +atch.pl line 19.
$wordfile SSN3 CDK8 GIG2 NUT7 RYE5 SRB10 UME5 . . $textfile 17170106|Perturbation of the activity of replication origin by meiosis + specific transcription.|We have determined the activity of all ARSs +on the Saccharomyces cerevisiae chromosome VI as chromosomal replicat +ion origins in pre-meiotic S-phase by neutral/neutral 2D gel-electrop +horesis. The comparison of origin activity of each origin in mitotic +and pre-meiotic S-phase showed that one of the most efficient origins + in mitotic S-phase, ARS605, was completely inhibited in pre-meiotic +S-phase. ARS605 is located within the ORF of Msh4 gene that is transc +ribed specifically during an early stage of meiosis. Systematic analy +ses of relationships between Msh4 transcription and ARS605 origin act +ivity revealed that transcription of Msh4 inhibited the ARS605 origin + activity by removing ORC from ARS605. Deletion of UME6 {{UME6}}, a t +ranscription factor responsible for repressing Msh4 during mitotic S- +phase, resulted in inactivation of ARS605 in mitosis. Our finding is +the first demonstration that the transcriptional regulation on the re +plication origin activity is related to changes in cell physiology. T +hese results may provide insights into changes in replication origin +activity in embryonic cell cycle during early developmental stages. . .
Raj
My code:
#!/usr/bin/perl use warnings; use strict; if ($#ARGV != 4) { print "usage: run batch file 'run' not this one\n"; exit; } my $wordfile = $ARGV[0]; my $textfile=$ARGV[3]; my $OutPutFile=$ARGV[4]; open (IF1,"$wordfile")|| die "cannot open the file"; open (PF, "$textfile")|| die "cannot open the file"; open (OF,">$OutPutFile")|| die "cannot open the file"; my $List1Ref=ReadDataInHash (*IF1); while (my $line=<PF>) { chomp($line); my @arrAbs=split (/\|/,$line); my $ID=$arrAbs[0]; my $Title=$arrAbs[1]; my $Abs=$arrAbs[2]; @arrAbs=split (/\./,$Abs); print OF"$ID|"; for (my $SentenceNumber=0;$SentenceNumber<=$#arrAbs ;$SentenceNumb +er++) { my $i=$SentenceNumber+1; print OF "<".$i.">"; my $Sentence=$arrAbs[$SentenceNumber]; my @arrAbsSen=split (' ',$Sentence); foreach my $word(@arrAbsSen) { #to match terms in the list, stored in %{$List1Ref}. if (exists(${%{$List1Ref}}{uc($word)})) { print OF "$word "; } else { foreach my $p (sort keys (%{$List1Ref})) { if (exists(${%{${%{$List1Ref}}{$p}}}{uc($word)})) +{ print OF "mainterm:$p:matchedterm:$word "; last; } } } } @arrAbsSen=(); } print OF "\n"; @arrAbs=(); } sub ReadDataInHash() { my $x = shift; my %list1=(); while (my $line =<$x>) { chomp $line; my @arr=split /\s/,$line; for (my $i=0;$i<=$#arr ;$i++) { if ($i==0) { $list1{$arr[$i]}={}; } else{ ${%{$list1{$arr[0]}}}{$arr[$i]} = 1; } } } return {%list1}; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Error: "called too early to check prototype" and is word search using nested hash optimal?
by ikegami (Patriarch) on Jan 29, 2007 at 19:15 UTC | |
|
Re: Error: "called too early to check prototype" and is word search using nested hash optimal?
by blazar (Canon) on Jan 30, 2007 at 15:22 UTC | |
by newbio (Beadle) on Jan 30, 2007 at 15:51 UTC |