I-Box has asked for the wisdom of the Perl Monks concerning the following question:
Hi guys. I am totally new to perl. Just started to work on it a week ago..please excuse my stupidity in btwn...Glad to be a part of this monastery..I'm trying to build a simple RegEX text parser. I will be having a three input files . first one is GENE which contains a list of gene names second one TARGET contains a list of protein locations inside the cell. third one IF is my input full text which i have made it into a single line one. Now while parsing I will be looking for the GENE entry, verb, and TARGET entry. in a single line of the text. if three are present then I print them out.
#! usr/bin/perl use strict; use warnings; # opening the input lexicons open (GENE,"/home/stanley/Desktop/Gene.txt"); open (TARGET, "/home/stanley/Desktop/Target.txt"); my $target; my $gene; # opening fulltext open (IF, "/home/stanley/Desktop/18048320.txt"); my $text = <IF>; my @splittext = split (/[.] [A-Z]/, $text); close (IF); # opening output file open (OF, ">/home/stanley/Desktop/Local.txt"); # Parsing Text for $gene (<GENE>) { chomp $gene; while ($target = <TARGET>) { chomp $target; foreach my $line (@splittext) { if ($line =~ /.+?($gene).*(localizes to|held|located i +n|localization|translocated to|targets|reaches|exported|export).*($ta +rget).+?/ig) { print OF $1."\t".$2."\t".$3."\n"; } } } } close (OF); close (GENE); close (TARGET);
____DATA___ gene.txt pfrom1 pfama1 ama1 ha-pfrom1 target.txt apicoplast mitochondrion rhoptry rhoptries golgi dense granules parasitophorous vacuole micronemes food vacuole secretory vesicle host cell input txt ompartmentalization of proteins into subcellular organelles in eukaryo +tic cells is a fundamental mechanism of regulating complex cellular f +unctions. Many proteins of Plasmodium falciparum merozoites involved +in invasion are compartmentalized into apical organelles. We have ide +ntified a new merozoite organelle that contains P. falciparum rhomboi +d-1 (PfROM1), a protease that cleaves the transmembrane regions of pr +oteins involved in invasion. By immunoconfocal microscopy, PfROM1 was + localized to a single, thread-like structure on one side of the mero +zoites that appears to be in close proximity to the subpellicular mic +rotubules. Using antibodies to the merozoite surface protein-1 (MSP1) +, a protein that is located in the merozoite plasma membrane (Fig. 3A +), we demonstrated that HA-PfROM1 staining is intracellular, not colo +calizing with the plasma membrane. In merozoites, PfAMA1 is located i +n micronemes and thus separated from PfROM1. Toxoplasma gondii ROM1, +the orthologue of PfROM1, is located in the Golgi, secretory vesicles +, and in micronemes (10; L. D. Sibley, unpublished data). For example +, Plasmodium falciparum apical membrane antigen 1 (AMA1) is held in t +he micronemes in merozoites inside of erythrocytes. For example, Plas +modium falciparum apical membrane antigen 1 (AMA1) is held in the mic +ronemes in merozoites inside of erythrocytes.
This a just a small part of the input file. Now what happens is that the parser checks for only the first GENE entry and then quits the loop. what I want is foreach of the geneentry it should take up all the target possibilities one by one and check for the pattern in each line of the text
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Simple RegEX text parser
by almut (Canon) on Dec 30, 2008 at 10:58 UTC | |
by I-Box (Acolyte) on Dec 30, 2008 at 11:16 UTC | |
|
Re: Simple RegEX text parser
by linuxer (Curate) on Dec 30, 2008 at 11:00 UTC | |
|
Re: Simple RegEX text parser
by oko1 (Deacon) on Dec 30, 2008 at 15:48 UTC | |
by planetscape (Chancellor) on Dec 30, 2008 at 16:02 UTC | |
by ikegami (Patriarch) on Dec 30, 2008 at 19:02 UTC | |
by bart (Canon) on Jan 03, 2009 at 19:19 UTC | |
by oko1 (Deacon) on Dec 31, 2008 at 02:06 UTC | |
by I-Box (Acolyte) on Dec 30, 2008 at 16:42 UTC | |
|
Re: Simple RegEX text parser
by I-Box (Acolyte) on Jan 02, 2009 at 19:19 UTC | |
by planetscape (Chancellor) on Jan 03, 2009 at 18:40 UTC | |
by I-Box (Acolyte) on Jan 03, 2009 at 08:31 UTC | |
|
Re: Simple RegEX text parser
by I-Box (Acolyte) on Jan 02, 2009 at 09:33 UTC | |
by ikegami (Patriarch) on Jan 02, 2009 at 10:18 UTC | |
by I-Box (Acolyte) on Jan 02, 2009 at 10:45 UTC | |
by kevk (Initiate) on Aug 13, 2009 at 10:52 UTC |