Re: Simple RegEX text parser

Since your data set is fairly small, you might want to consider building the regex out of the pieces you're looking for:

#!/usr/bin/perl -w
use strict;

our ($Gene, $Target, $Input) = qw/gene.txt target.txt input.txt/;

open Gene or die "$Gene: $!\n";
open Target or die "$Target: $!\n";
open Input or die "$Input: $!\n";

my $gene = join "|", grep {chomp} <Gene>;
my $target = join "|", grep {chomp} <Target>;
chomp(my $input = <Input>);

close $_ for qw/Gene Target Input/;

my $verbs = 'localizes to|held|located in|localization|translocated to
+|targets|reaches|exported|export';

# Note corrected 'split' regex
for my $sentence (split /\. [A-Z]/, $input){
    my $found;
    for ($sentence =~ /($gene).*?($verbs).*?($target)/ig){
        print "$_\t";
        $found++;
    }
    print "\n" if $found;
}
[download]

Output:

PfAMA1  located in      micronemes
PfROM1  located in      Golgi
AMA1    held    micronemes
AMA1    held    micronemes
[download]

--
"Language shapes the way we think, and determines what we can think about."
-- B. L. Whorf

Comment on Re: Simple RegEX text parser Select or Download Code

Replies are listed 'Best First'.
Re^2: Simple RegEX text parser by planetscape (Chancellor) on Dec 30, 2008 at 16:02 UTC
you might want to consider building the regex out of the pieces you're looking for grinder's most excellent Regexp::Assemble can certainly help with this sort of thing, identifying bits common to multiple words. Take a look at Why machine-generated solutions will never cease to amaze me for a sample of what this module can do; you'll be impressed. Oh, and grinder's scratchpad too... HTH, planetscape	[reply]
Re^3: Simple RegEX text parser by ikegami (Patriarch) on Dec 30, 2008 at 19:02 UTC
Regexp::Assemble is to join regexps. Regexp::List is to join strings into a regexp. They're in the same distribution.	[reply]
Re^4: Simple RegEX text parser by bart (Canon) on Jan 03, 2009 at 19:19 UTC
And there is the old Regex::PreSuf for the same purpose.	[reply]
Re^3: Simple RegEX text parser by oko1 (Deacon) on Dec 31, 2008 at 02:06 UTC
As noted by ikegami, Regexp::Assemble is slightly off the mark... but ++ nevertheless! Thank you for introducing me to a very fun, very useful module. The "Why machine-generated solutions will never cease to amaze me" link is also great. Much appreciated! -- "Language shapes the way we think, and determines what we can think about." -- B. L. Whorf	[reply]
Re^2: Simple RegEX text parser by I-Box (Acolyte) on Dec 30, 2008 at 16:42 UTC
Thanks Oko1..your code was really handy....I always wanted my code to be short and smart...	[reply]