in reply to Re^4: Use of uninitialized value in string eq
in thread Use of uninitialized value in string eq
Instead of -w use warnings. The command line flag applies to any modules you include as well as your own code so you may get warnings from code you essentially have no control over using -w.
Use the special variable $! in file i/o error handling messages to give a little more information about the nature of the failure.
You don't need to set arrays or hashes empty when you declare them. They are sold without batteries.
Don't declare variables in a lump: my ($foo, $baa, $baz). It makes it harder to see where they are declared and precludes providing a usage comment (although that shouldn't often be required).
It's not clear from the input loop whether you expect more than one gi. Your use of $accession implies that there should only be one (else the accession value isn't related to the gi value even though they come from the same line of data). You still haven't addressed the possibility that you get a well formatted sequence line before you get a gi line. These are related issues.
Use statement modifiers where you have a trivial statement controlled by a condition. For example $gi = $1 if defined $1;.
Avoid a proliferation of temporary variables and assignments. For example, see the change I made to your second while loop.
Avoid comments that say the same thing as the code. 'split into three parts' adds no extra information that a trivial inspection of the code doesn't tell you. The data format related comments ('... P82L ...') however are good!
You can use {} to ensure a variable is interpolated correctly in a string. For example you can write "${potential}{$gi}" instead of concatenating a bunch of substrings together.
My reworked version with the changes implied is:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $data = '/DATA/proteinfile.txt'; open my $inFile, '<', $data or die "Failed at opening $data: $!\n"; # Populate the info hash with GIs as keys and sequences as values my $humanGi; my $accession; my $gi; # Current gi while reading sequences my %info; while (<$inFile>) { my $line = $_; chomp $line; last if m!END!; if ($line =~ m/HUMAN/) { ($humanGi, $accession) = $line =~ m/^\S+\|(\d+)\|\w+\|(\S{6}?) +/; } if ($line =~ m/^\S+\|(\d+)/) { $gi = $1 if defined $1; } else { $info{$gi} = $line; } } close $inFile; my $data2 = '/DATA/variantlist.txt'; open $inFile, '<', $data2 or die "Failed at opening $data2: $!\n"; my $data3 = '/DATA/VariantOutput.txt'; open my $outfile, '>', $data3 or die "Failed at opening $data3: $!\n"; print $outFile "This is [GI: $humanGi] and [Accession: $accession]\nVARIANT\t\tPO +TENTIAL\t\tPD\n"; while (defined (my $Variant = <$inFile>)) { # Grab a variant from the file (in this example: P82L) chomp $Variant; my ($source, $position, $sink) = split /(\d+)(\w)/, $Variant; # Check whether HS has the source (i.e., P) at the given position +(i.e., 82) my @char = split //, $info{$humanGi}; my $target = $char[$position - 1]; my @VariantList; my @PDList; # Scan the rest of the sequences to check what amino acid they hav +e at # the given position for my $gi (keys %info) { my @char2 = split //, $info{$gi}; my $potential = $char2[$position - 1]; push @VariantList, $potential; if ($potential eq $sink) { # Note the cases where we observe the sink (i.e., L) at th +is position push @PDList, "${potential}{$gi}"; } } print $outFile "$Variant\t\t@VariantList\t\t@PDList\n"; } close $inFile; close $outFile;
This is still rather unsatisfactory code because there are many ways it can fail due to unexpected data. Sanity checking helps ensure the code and the data format conform to the same expectations and make it much easier to diagnose problems when expectations aren't met.
There are no obvious inefficiencies in the current code. Initially efficiency shouldn't be an major consideration in any case. Generally if you avoid nested loops to the extent reasonable and avoid re-reading input files code of this sort will perform well enough for most purposes.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Use of uninitialized value in string eq
by sophix (Sexton) on Apr 26, 2010 at 02:36 UTC | |
by GrandFather (Saint) on Apr 26, 2010 at 03:28 UTC | |
by sophix (Sexton) on Apr 26, 2010 at 15:23 UTC | |
|
Re^6: Use of uninitialized value in string eq
by sophix (Sexton) on Apr 25, 2010 at 01:02 UTC |