Good evening wise monks,
I wrote this perl script to help filter out the raw data from a Pubmed article reader (called ppaxe, by Sergio Castillo). Basically, ppaxe reads for me thousands of articles on Pubmed and searches for possible interactions between proteins/genes. I end up with verbs that do not actually indicate an interaction or lines with multiple verbs, of which some of those verbs do and others do not.
My perl script basically needs to filter out any line that does not have a verb that indicates an interaction. I have a file of approved verbs, a file of discarded verbs and my ppaxe results file. I put my verb lists into arrays and used index instead of exists function for matching. I am not allowed to use regex so that the next generation that takes over can understand the program better.
When I run my perl program it just ends up printing the whole data file without actually filtering. Can anyone help me in correcting my program and teaching me what I am doing wrong?
Thanks so much,
#!/usr/bin/perl
# discard_lines_by_verbs.pl
use strict;
use warnings;
die "Please use suitable files" if (@ARGV != 3);
my $dis_verbs = shift @ARGV;
my $apr_verbs = shift @ARGV;
my $ppaxe = shift @ARGV;
open(my $in1, "<", "$dis_verbs")
or die "error reading $dis_verbs. $!";
open(my $in2, "<", "$apr_verbs")
or die "error reading $apr_verbs. $!";
open(my $in3, "<", "$ppaxe")
or die "error reading $ppaxe. $!";
my @dis_dic;
my @apr_dic;
while (my $f1_line = <$in1>) {
chomp($f1_line);
@dis_dic = $f1_line;
}
while (my $f2_line = <$in2>) {
chomp($f2_line);
@apr_dic = $f2_line;
}
while (my $f3_line = <$in3>) {
chomp($f3_line);
if ( index($f3_line, @apr_dic) != -1 ) {
print "$f3_line\n";
}
elsif ( index($f3_line, @apr_dic && @dis_dic) != -1 ) {
print "$f3_line\n";
}
else {
next;
}
}
close($in1);
close($in2);
close($in3);
These files are small test versions:
approved_verbs_test:
ACTIVATES
ADPRIBOSYLATED
ALTERS
ARGINYLATED
ASSOCIATES
BINDS
discarded_verbs_test:
ARE
ASK
ASSESS
BASED
BECAME
IS
sample_ppaxe_data:
RPSA AKT1 18628488 0.634 BINDS,ALTERS
RUNX2 DKK1 22960397 0.746 ADPRIBOSYLATED,ALTERS
ARHGAP31 RASA1 17158447 0.56 ASSOCIATES
ARHGAP31 RNASE1 17158447 0.602 BECOME
RASA1 RNASE1 17158447 0.554 BASED
NOS1 NOS3 19799911 0.628 ARGINYLATED,BASED
VTN PRAP1 27189837 0.582 IS
MAPK8 RHOD 11414711 0.698 ARGINYLATED,BINDS
IL2 SETBP1 8398987 0.556 BINDS
S100A8 S100A9 20105291 0.596 ASSESS
Desired outcome:
RPSA AKT1 18628488 0.634 BINDS,ALTERS
RUNX2 DKK1 22960397 0.746 ADPRIBOSYLATED,ALTERS
ARHGAP31 RASA1 17158447 0.56 ASSOCIATES
NOS1 NOS3 19799911 0.628 ARGINYLATED,BASED
MAPK8 RHOD 11414711 0.698 ARGINYLATED,BINDS
IL2 SETBP1 8398987 0.556 BINDS
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.