Dear Perl Monks,
First let me say I am new to perl, I am using it for bioinformatics/genetic analysis.
What I want my script to do:
I have an
IDs file in tab format, it looks like this (with info between slashes representing columns)
/Unwated/ ID required/ Unwanted1/ Unwanted2/...Unwanted6/ ID_alias/ ID_alias1/ ... ID_alias36/
I also have a
gene names file, for which I want to return the official ID. The gene name may be in any one of the alias columns.
Gene names file looks like this
/Info/ Gene name/ Info1/ Info 2...
Should be simple right?
What my script does:
It returns an empty output file.
I will post it here in the hope one of you learned people will be able to spot the mistakes. I am using dummy files to get it working (see below for examples).
Please excuse extensive comments in my script, like I said I am new to this.
Thank you in advance,
-Walking before I can run...
Script
################################################
# Declare an outfile to print to
my $outfile = "HUGO_dummyResults.txt";
# Open the outfile using a file handle
open( OUT, "> $outfile" ) or die "cannot create the output file";
#################################################
# Open file of list of neurotransmission genes where ENST has not been
+ found
# FILENOTES::: File created in access using a query against approved H
+UGO name and gene name.
#Column 3 of file [2] is gene name col 4 [3] is the pathway gene is as
+sociated with
open (DUMMY_GENEFILE, 'DummyGenes.txt') or die "cannot open file conta
+ining genes";
#################################################
# 2- Open HUGO tabbed file
# FILENOTES:::: Approved gene name is in col 2 [1]
open (DUMMYHUGO,'DummyHugo.txt') or die "cannot open file containing H
+UGO IDs";
#################################################
#Operations
#################################################
#make array genes
#@genes = DUMMY_GENEFILE; #No longer done here, see below
#make array HUGO
@hugo = DUMMYHUGO;
#for each line in genefile, try to match gene name [2] to one column o
+f the columns [5]-[8] in the HUGO ID file.
#check col 6, if found print, if not found, check next column. If neve
+r found, print "not found".
foreach (<DUMMY_GENEFILE>) #Changed from (<DUMMY_GENEFILE>)
{
#make array genes
@genes = DUMMY_GENEFILE;
for ($i = 4; $i < @hugo; $i++)
{
if ($genes[2] eq $hugo[$i])
#If found first print result
{
print OUT "$genes[0]\t$genes[1]\t$genes[2]\tgenes[
+3]\t$hugo[1]\n";
}
# HUGO ID not found, print
print OUT "$genes[0]\t$genes[1]\t$genes[2]\tgenes[3]\tNo H
+UGO ID\n";
}
}
close (DUMMYHUGO);
close (DUMMY_GENEFILE);
close (OUT);
exit;
________________
Here are some dummy files to help demonstrate.
DummyHugo.txt
HGCNID:1 SKJ Info1 Info2 Info3 Sandra San Katey Jones
HGCNID:2 DJL Info1 Info2 Info3 Dave David James London
HGCNID:3 PKKJ INfo1 INfo2 INfo3 Paul Kevin Kean June
HGCNID:4 KJRJ INfo1 Info2 INfo3 Katie Joanna Rachel Jolie
DummyGenes.txt
ID1 Id2 Katie Path
ID1a Id2a Dave Path
ID1b Id2b Kean Path
ID1c Id2c Paul Path
ID1d Id2d Sandra Path
____________________________
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.