Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Good evening wise monks,

I wrote this perl script to help filter out the raw data from a Pubmed article reader (called ppaxe, by Sergio Castillo). Basically, ppaxe reads for me thousands of articles on Pubmed and searches for possible interactions between proteins/genes. I end up with verbs that do not actually indicate an interaction or lines with multiple verbs, of which some of those verbs do and others do not.

My perl script basically needs to filter out any line that does not have a verb that indicates an interaction. I have a file of approved verbs, a file of discarded verbs and my ppaxe results file. I put my verb lists into arrays and used index instead of exists function for matching. I am not allowed to use regex so that the next generation that takes over can understand the program better.

When I run my perl program it just ends up printing the whole data file without actually filtering. Can anyone help me in correcting my program and teaching me what I am doing wrong?

Thanks so much,

#!/usr/bin/perl # discard_lines_by_verbs.pl use strict; use warnings; die "Please use suitable files" if (@ARGV != 3); my $dis_verbs = shift @ARGV; my $apr_verbs = shift @ARGV; my $ppaxe = shift @ARGV; open(my $in1, "<", "$dis_verbs") or die "error reading $dis_verbs. $!"; open(my $in2, "<", "$apr_verbs") or die "error reading $apr_verbs. $!"; open(my $in3, "<", "$ppaxe") or die "error reading $ppaxe. $!"; my @dis_dic; my @apr_dic; while (my $f1_line = <$in1>) { chomp($f1_line); @dis_dic = $f1_line; } while (my $f2_line = <$in2>) { chomp($f2_line); @apr_dic = $f2_line; } while (my $f3_line = <$in3>) { chomp($f3_line); if ( index($f3_line, @apr_dic) != -1 ) { print "$f3_line\n"; } elsif ( index($f3_line, @apr_dic && @dis_dic) != -1 ) { print "$f3_line\n"; } else { next; } } close($in1); close($in2); close($in3);

These files are small test versions:

approved_verbs_test:

ACTIVATES ADPRIBOSYLATED ALTERS ARGINYLATED ASSOCIATES BINDS

discarded_verbs_test:

ARE ASK ASSESS BASED BECAME IS

sample_ppaxe_data:

RPSA AKT1 18628488 0.634 BINDS,ALTERS RUNX2 DKK1 22960397 0.746 ADPRIBOSYLATED,ALTERS ARHGAP31 RASA1 17158447 0.56 ASSOCIATES ARHGAP31 RNASE1 17158447 0.602 BECOME RASA1 RNASE1 17158447 0.554 BASED NOS1 NOS3 19799911 0.628 ARGINYLATED,BASED VTN PRAP1 27189837 0.582 IS MAPK8 RHOD 11414711 0.698 ARGINYLATED,BINDS IL2 SETBP1 8398987 0.556 BINDS S100A8 S100A9 20105291 0.596 ASSESS

Desired outcome:

RPSA AKT1 18628488 0.634 BINDS,ALTERS RUNX2 DKK1 22960397 0.746 ADPRIBOSYLATED,ALTERS ARHGAP31 RASA1 17158447 0.56 ASSOCIATES NOS1 NOS3 19799911 0.628 ARGINYLATED,BASED MAPK8 RHOD 11414711 0.698 ARGINYLATED,BINDS IL2 SETBP1 8398987 0.556 BINDS

In reply to Matching multiple substrings of a string to arrays and printing those that match by rarenas

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-26 03:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found