Hi guys I have dropped the plan of embedding a java application for sentence breaking. Guess what..we have the best in Perl itself...Lingua::EN:Sentence module..a brilliant work..it works fine for me..now please have a look on my latest code. I still havent been able to fix my regex part...the current regex returns with least no: of matches...I would really appreciate some help in getting it fixed..
#! usr/bin/perl use strict; use warnings; use Lingua::EN::Sentence qw( get_sentences add_acronyms ); # opening the input lexicons open (GENE,"Gene.txt") || die "Cannot open Gene.txt !!"; open (TARGET, "Target.txt") || die " Cannot open Target.txt !!"; my $target; my $gene; # opening fulltext and sentence breaking open (IF, "Input.txt") || die " Cannot open Fulltext !!"; my $text = <IF>; my $sentences=get_sentences($text); close (IF); # opening output file open (OF, ">results.txt"); # Parsing Text my $verbs = "localized|held|located in|localization|translocated to|ta +rgets|reaches|exported|export"; while ($gene = <GENE>) { chomp $gene; seek (TARGET,0,0); while ($target = <TARGET>) { chomp $target; foreach my $sentence (@$sentences) { if ($sentence =~ /($gene).+($verbs).+($target)/ig) { print OF $1."\t".$2."\t".$3."\t\t".$sentence."\n"; } } } } close (OF); close (GENE); close (TARGET);
as Oko1 has suggested before ...I have built up a regex with "|" ...but the results were much less than those by my intial code posted in my first thread in this node. I also tried using Regexp::List..but wasnt able to work out a solution...would be nice if someone could give me start with a small code involving Regexp::List
_______MY RESULTS____ AMA1 held micronemes For example, Plasmodium falciparum a +pical membrane antigen 1 (AMA1) is held in the micronemes in merozoit +es inside of erythrocytes. AMA1 located in micronemes In merozoites, PfAMA1 is locat +ed in micronemes and thus separated from PfROM1. AMA1 translocated to subpellicular microtubules The prote +in AMA1 is then translocated to the food vacuole, apicoplast, subpell +icular microtubules. PfROM1 located in micronemes Toxoplasma gondii ROM1, the +orthologue of PfROM1, is located in the secretory vesicles, Golgi, an +d in micronemes (10; L. D. Sibley, unpublished data). PfROM1 located in micronemes PfROM1 was also thought to b +e located in micronemes (13), based on data localizing a PfROM1 const +ruct that was missing two 5′ exons which encode one of the tran +smembrane domains of PfROM1 (SI Figs. AMA1 held micronemes For example, Plasmodium falciparum a +pical membrane antigen 1 (AMA1) is held in the micronemes in merozoit +es inside of erythrocytes. AMA1 located in micronemes In merozoites, PfAMA1 is locat +ed in micronemes and thus separated from PfROM1. AMA1 translocated to subpellicular microtubules The prote +in AMA1 is then translocated to the food vacuole, apicoplast, subpell +icular microtubules. PfROM1 located in micronemes Toxoplasma gondii ROM1, the +orthologue of PfROM1, is located in the secretory vesicles, Golgi, an +d in micronemes (10; L. D. Sibley, unpublished data). PfROM1 located in micronemes PfROM1 was also thought to b +e located in micronemes (13), based on data localizing a PfROM1 const +ruct that was missing two 5′ exons which encode one of the tran +smembrane domains of PfROM1 (SI Figs.
In reply to Re: Simple RegEX text parser
by I-Box
in thread Simple RegEX text parser
by I-Box
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |