Here's another piece of code which I wrote for the same task...this works great..but it takes more resources...thats why I prefer the first code...my input files are to contain many entries (~10000 or more)...and I have to use this parser to parse thousands of articles too..

both of these codes are supposed to give the same results for the articles. Try adding more toy sentences to the article and you shall see

#! usr/bin/perl use strict; use warnings; use Lingua::EN::Sentence qw( get_sentences add_acronyms ); # opening the input lexicons open (GENE,"Gene.txt") || die "Cannot open Gene.txt !!"; open (TARGET, "Target.txt") || die " Cannot open Target.txt !!"; my $target; my $gene; # opening fulltext and sentence breaking open (IF, "Input.txt") || die " Cannot open Fulltext !!"; my $text = <IF>; my $sentences=get_sentences($text); close (IF); # opening output file open (OF, ">results.txt"); # Parsing Text my $verbs = "localized|held|located in|localization|translocated to|ta +rgets|reaches|exported|export"; while ($gene = <GENE>) { chomp $gene; seek (TARGET,0,0); while ($target = <TARGET>) { chomp $target; foreach my $sentence (@$sentences) { if ($sentence =~ /($gene).+($verbs).+($target)/ig) { print OF $1."\t".$2."\t".$3."\t\t".$sentence."\n"; } } } } close (OF); close (GENE); close (TARGET);

this code gave me 26 hits for my trial text and the code given in previous post gave only 5 hits...I cant figure what went wrong..please help me out

____MY RESULTS____ PfROM1 localized subpellicular microtubules By immunoconf +ocal microscopy, PfROM1 was localized to a single, thread-like struct +ure on one side of the merozoites that appears to be in close proximi +ty to the subpellicular microtubules. PfROM1 localized subpellicular microtubules HA-PfROM1 was + observed to be localized in close proximity to longitudinal subpelli +cular microtubules of the merozoite (Fig. PfROM1 localized rhoptries Thus, these results indicate t +hat HA-PfROM1 is localized in a subcellular compartment distinct from + the micronemes, rhoptries, and dense granules. PfROM1 localized rhoptries HA-PfROM1 is not localized to +known apical secretory organelles: rhoptries, micronemes, and dense g +ranules. PfROM1 located in Golgi Toxoplasma gondii ROM1, the ortho +logue of PfROM1, is located in the secretory vesicles, Golgi, and in +micronemes (10; L. D. Sibley, unpublished data). PfROM1 localized dense granules Thus, these results indic +ate that HA-PfROM1 is localized in a subcellular compartment distinct + from the micronemes, rhoptries, and dense granules. PfROM1 localized dense granules HA-PfROM1 is not localize +d to known apical secretory organelles: rhoptries, micronemes, and de +nse granules. PfROM1 localized micronemes Thus, these results indicate +that HA-PfROM1 is localized in a subcellular compartment distinct fro +m the micronemes, rhoptries, and dense granules. PfROM1 localized micronemes HA-PfROM1 is not localized to + known apical secretory organelles: rhoptries, micronemes, and dense +granules. PfROM1 localized micronemes HA-PfROM1 staining appeared t +o be colocalized, in part, with the PfAMA1 staining that translocates + from micronemes to the parasite surface on release of micronemal con +tents during invasion (Fig. PfROM1 located in micronemes Toxoplasma gondii ROM1, the +orthologue of PfROM1, is located in the secretory vesicles, Golgi, an +d in micronemes (10; L. D. Sibley, unpublished data). PfROM1 located in micronemes PfROM1 was also thought to b +e located in micronemes (13), based on data localizing a PfROM1 const +ruct that was missing two 5&#8242; exons which encode one of the tran +smembrane domains of PfROM1 (SI Figs. PfROM1 located in secretory vesicle Toxoplasma gondii ROM +1, the orthologue of PfROM1, is located in the secretory vesicles, Go +lgi, and in micronemes (10; L. D. Sibley, unpublished data). HA-PfROM1 localized subpellicular microtubules HA-PfROM1 +was observed to be localized in close proximity to longitudinal subpe +llicular microtubules of the merozoite (Fig. HA-PfROM1 localized rhoptries Thus, these results indicat +e that HA-PfROM1 is localized in a subcellular compartment distinct f +rom the micronemes, rhoptries, and dense granules. HA-PfROM1 localized rhoptries HA-PfROM1 is not localized +to known apical secretory organelles: rhoptries, micronemes, and dens +e granules. HA-PfROM1 localized dense granules Thus, these results in +dicate that HA-PfROM1 is localized in a subcellular compartment disti +nct from the micronemes, rhoptries, and dense granules. HA-PfROM1 localized dense granules HA-PfROM1 is not local +ized to known apical secretory organelles: rhoptries, micronemes, and + dense granules. HA-PfROM1 localized micronemes Thus, these results indica +te that HA-PfROM1 is localized in a subcellular compartment distinct +from the micronemes, rhoptries, and dense granules. HA-PfROM1 localized micronemes HA-PfROM1 is not localized + to known apical secretory organelles: rhoptries, micronemes, and den +se granules. HA-PfROM1 localized micronemes HA-PfROM1 staining appeare +d to be colocalized, in part, with the PfAMA1 staining that transloca +tes from micronemes to the parasite surface on release of micronemal +contents during invasion (Fig. AMA1 translocated to apicoplast The protein AMA1 is then +translocated to the food vacuole, apicoplast, subpellicular microtubu +les. AMA1 translocated to subpellicular microtubules The prote +in AMA1 is then translocated to the food vacuole, apicoplast, subpell +icular microtubules. AMA1 held micronemes For example, Plasmodium falciparum a +pical membrane antigen 1 (AMA1) is held in the micronemes in merozoit +es inside of erythrocytes. AMA1 located in micronemes In merozoites, PfAMA1 is locat +ed in micronemes and thus separated from PfROM1. AMA1 translocated to food vacuole The protein AMA1 is the +n translocated to the food vacuole, apicoplast, subpellicular microtu +bules.

In reply to Re^2: Simple RegEX text parser by I-Box
in thread Simple RegEX text parser by I-Box

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.