Instead of -w use warnings. The command line flag applies to any modules you include as well as your own code so you may get warnings from code you essentially have no control over using -w.

Use the special variable $! in file i/o error handling messages to give a little more information about the nature of the failure.

You don't need to set arrays or hashes empty when you declare them. They are sold without batteries.

Don't declare variables in a lump: my ($foo, $baa, $baz). It makes it harder to see where they are declared and precludes providing a usage comment (although that shouldn't often be required).

It's not clear from the input loop whether you expect more than one gi. Your use of $accession implies that there should only be one (else the accession value isn't related to the gi value even though they come from the same line of data). You still haven't addressed the possibility that you get a well formatted sequence line before you get a gi line. These are related issues.

Use statement modifiers where you have a trivial statement controlled by a condition. For example $gi = $1 if defined $1;.

Avoid a proliferation of temporary variables and assignments. For example, see the change I made to your second while loop.

Avoid comments that say the same thing as the code. 'split into three parts' adds no extra information that a trivial inspection of the code doesn't tell you. The data format related comments ('... P82L ...') however are good!

You can use {} to ensure a variable is interpolated correctly in a string. For example you can write "${potential}{$gi}" instead of concatenating a bunch of substrings together.

My reworked version with the changes implied is:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $data = '/DATA/proteinfile.txt'; open my $inFile, '<', $data or die "Failed at opening $data: $!\n"; # Populate the info hash with GIs as keys and sequences as values my $humanGi; my $accession; my $gi; # Current gi while reading sequences my %info; while (<$inFile>) { my $line = $_; chomp $line; last if m!END!; if ($line =~ m/HUMAN/) { ($humanGi, $accession) = $line =~ m/^\S+\|(\d+)\|\w+\|(\S{6}?) +/; } if ($line =~ m/^\S+\|(\d+)/) { $gi = $1 if defined $1; } else { $info{$gi} = $line; } } close $inFile; my $data2 = '/DATA/variantlist.txt'; open $inFile, '<', $data2 or die "Failed at opening $data2: $!\n"; my $data3 = '/DATA/VariantOutput.txt'; open my $outfile, '>', $data3 or die "Failed at opening $data3: $!\n"; print $outFile "This is [GI: $humanGi] and [Accession: $accession]\nVARIANT\t\tPO +TENTIAL\t\tPD\n"; while (defined (my $Variant = <$inFile>)) { # Grab a variant from the file (in this example: P82L) chomp $Variant; my ($source, $position, $sink) = split /(\d+)(\w)/, $Variant; # Check whether HS has the source (i.e., P) at the given position +(i.e., 82) my @char = split //, $info{$humanGi}; my $target = $char[$position - 1]; my @VariantList; my @PDList; # Scan the rest of the sequences to check what amino acid they hav +e at # the given position for my $gi (keys %info) { my @char2 = split //, $info{$gi}; my $potential = $char2[$position - 1]; push @VariantList, $potential; if ($potential eq $sink) { # Note the cases where we observe the sink (i.e., L) at th +is position push @PDList, "${potential}{$gi}"; } } print $outFile "$Variant\t\t@VariantList\t\t@PDList\n"; } close $inFile; close $outFile;

This is still rather unsatisfactory code because there are many ways it can fail due to unexpected data. Sanity checking helps ensure the code and the data format conform to the same expectations and make it much easier to diagnose problems when expectations aren't met.

There are no obvious inefficiencies in the current code. Initially efficiency shouldn't be an major consideration in any case. Generally if you avoid nested loops to the extent reasonable and avoid re-reading input files code of this sort will perform well enough for most purposes.

True laziness is hard work

In reply to Re^5: Use of uninitialized value in string eq by GrandFather
in thread Use of uninitialized value in string eq by sophix

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.