Ch1ralS0ul has asked for the wisdom of the Perl Monks concerning the following question:
I am attempting to write a piece of code that reads in a couple of files, takes the first two columns from a .tsv files as the Gene ID and Gene Symbol for a key/value pair in a hash, uses the second file to read in some data that corresponds to the Gene ID, and then prints the data to a new file. I am fairly new to programming and so of course am running into an uninitialized variable brick wall. I'm not sure if this is due to an improper array split, a logic issue with my RegEx or if it is one of those little brain farts. Anyways, any advice would be helpful! The uninitialized variable I'm running into is the first column of the array: "INF1Array[0]" on line 34. (Sorry if my formatting is a little unorthodox!)
#!/usr/bin/perl use warnings; use diagnostics; # Title: convertDataToGeneSymbol.pl # Author: Nicholas Bense # Date: 11/4/15 # Open a filehandle to read file #1 open(INF1,"<",'/scratch/Drosophila/fb_synonym_fb_2014_05.tsv' ) or + die $!; # Open a filehandle to read file #2 open(INF2,"<",'/scratch/Drosophila/FlyRNAi_data_baseline_vs_EGF.tx +t') or die $!; # Open a filehandle to read file #3 open(INF3,"<",'/scratch/Drosophila/gene_association.goa_fly') or d +ie $!; # Open a filehandle to write new file open(OUTF1,">",'FLYRNAi_data_baseline_vs_EGFSymbol.txt') or die $! +; # Initialize a hash for the gene symbol conversion my %geneSymbolConversion; # Read Input File 1 line by line while (<INF1>){ # Get rid of whitespace chomp; # Split the line my @INF1Array = split("\t", $_); # Filter entries starting with FBgn while ($INF1Array[0] =~ /(^FBgn\d+)/){ # Assign column 1 to hash key scalar my $geneID = $INF1Array[0]; # Assign column 2 to hash value scalar my $geneSymbol = $INF1Array[1]; # Assign key and value to hash $geneSymbolConversion{$geneID} = $geneSymbol; } } # Read Input File 2 line by line while (<INF2>){ # Get rid of whitespace chomp; # Initialize key value in case it is not found my $geneSymbol = "NA"; # Split the line on tabs my ($geneID, $EGF_Baseline, $EGF_Stimulus) = split("\t", $ +_); # Check if the codon is present in the hash if (defined $geneSymbolConversion{$geneID}){ # Get the value associated with the codon from the + hash $geneSymbol = $geneSymbolConversion{$geneID}; } # Join data and print to output file print OUTF1 join( "\t", $geneID, $geneSymbol, $EGF_Baselin +e, $EGF_Stimulus), "\n"; }
P.S. I will also be reading in the third input file /scratch/Drosophila/gene_association.goa_fly to load columns 3 and 5 from the gene association file into a hash- with column 3 (gene symbol) being the key and column 5 (GO term) the value- then use the hash to convert FlyRNAi_data_baseline_vs_EGFSymbol.txt to FlyRNAi_data_baseline_vs_EGF_GO.txt with the gene symbol replaced by the GO term. If you'd like to provide some tips or mention some potential pitfalls based on my apparent coding habits then please go nuts! Was going to make sure I had this portion of the program running correctly before working in that third element. Mucho gracias!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Uninitialized Value Hash Lookup Gene Symbol
by grasshopper!!! (Beadle) on Nov 05, 2015 at 17:23 UTC | |
|
Re: Uninitialized Value Hash Lookup Gene Symbol
by choroba (Cardinal) on Nov 05, 2015 at 15:55 UTC | |
by Ch1ralS0ul (Initiate) on Nov 06, 2015 at 04:11 UTC | |
|
Re: Uninitialized Value Hash Lookup Gene Symbol
by graff (Chancellor) on Nov 05, 2015 at 22:23 UTC | |
|
Re: Uninitialized Value Hash Lookup Gene Symbol
by GotToBTru (Prior) on Nov 05, 2015 at 16:55 UTC |