in reply to Re: Use of uninitialized value in string eq
in thread Use of uninitialized value in string eq

Thank you, GrandFather. Particularly I appreciate your advise regarding the initialization of variables. Plus, I have not heard the hash slice before, that was very helpful too!

chomp @INFILE2 loop. You are right, I should have chomped the lines. Because I get that file by using another script where I add \n at the end. Thanks for pointing it out.

I revised the code in line with your suggestions, and get the following errors:

Possible unintended interpolation of @VariantList in string at PD.pl line 91.

I have no idea why I am getting this error. I thought I can safely use an array to print it out in this way.

Type of arg 1 to push must be array (not hash slice) at PD.pl line 33, near "$line;"

This is weird?!

Global symbol "@VariantList" requires explicit package name at PD.pl l +ine 91. Execution of PD.pl aborted due to compilation errors.

And here is the code. #!/usr/bin/perl -w use strict; use Data::Dumper; my %info = (); my $humangi; my $data = '/DATA/proteinfile.txt'; open INFILE, '<', $data or die "Failed at opening $data!\n"; # Construct the hash with GIs as keys and sequences as values while ( <INFILE> ) { my $line = $_; chomp($line); last if m!END!; if($line=~m/HUMAN/){ ($humangi) = ($line=~m/^\S+\|(\d+)/) } if($line=~m/^\S+\|(\d+)/) { if(defined($1)) { my $gi=$1; } } else { if (defined(my $gi)) { push (@info{$gi}, $line); } else { die "Badly formatted file. Failed at reading the GI!\n"; } } } #print Dumper (\%info); print "$humangi\n"; close(INFILE); my $data2 = '/DATA/variantlist.txt'; open INFILE2, '<', $data2 or die "Failed at opening $data2!\n"; my $data3 = '/DATA/VariantOutput.txt'; open OUTFILE, '>', $data3 or die "Failed at opening $data3!\n"; while ( <INFILE2>){ # Grab a variant from the file (in this example: P82L) my $line2 = $_; chomp($line2); my $Variant = $line2; # Split the variant into three parts my ($source, $position, $sink) = split(/(\d+)(\w)/, $Variant); print "$source , $ position , $sink\n"; # Check whether HS has the source (i.e., P) at the given position (i.e +., 82) my @temp = @info{$humangi}; if ( $temp[$position] eq $source) { print "Yep, $source has been confirmed!\n"; } else { print "There is something wrong!\n"; } # Scan the rest of the sequences to check what amino acid they have at + the given position for my $gi ( keys %info ) { my @value = @info{$gi}; my @VariantList = (); push ( @VariantList, $value[$position]); if ($value[$position] eq $sink){ # Note the cases where we obs +erve the sink (i.e., L) at this position print OUTFILE "A pathogenic deviation has been found at si +te $position - from $source to $sink !\n" . " And the corresponding g +i for this deviation is: $gi\n"; } } print OUTFILE "Variant list contains: @VariantList\n"; } close(INFILE2);

Replies are listed 'Best First'.
Re^3: Use of uninitialized value in string eq
by sophix (Sexton) on Apr 23, 2010 at 14:09 UTC
    Oops, my bad. I should have used the hash slice correctly.

    Changing  push (@info{$gi}, $line) to  push (@{$info{$gi}} solved the problem.

    So I am left with the error involving @VariantList. Probably I am making another fundamental mistake when assigning, e.g., trying to assign an array into a scalar. I am looking into it.

      Victory! I made the code work properly. Nevertheless, I do probably have efficiency problems. I would appreciate if you can comment on points where I can improve my code (in terms of both performance and appropriate style). Thanks guys, I could not do it without your help!

      Here is the final ugly code. #!/usr/bin/perl -w use strict; use Data::Dumper; my %info = (); my ($gi, $humangi, $accession); my $data = '/DATA/proteinfile.txt'; open INFILE, '<', $data or die "Failed at opening $data!\n"; # Construct the hash with GIs as keys and sequences as values while ( <INFILE> ) { my $line = $_; chomp($line); last if m!END!; if($line=~m/HUMAN/){ ($humangi) = ($line=~m/^\S+\|(\d+)/); ($accession) = ($line=~m/^\S+\|\d+\|\w+\|(\S{6}?)/); } if($line=~m/^\S+\|(\d+)/) { if(defined($1)) { $gi=$1; } } else { $info{$gi} = $line; } } #print Dumper (\%info); print "$humangi\n"; print "$accession\n"; close(INFILE); my $data2 = '/DATA/variantlist.txt'; open INFILE2, '<', $data2 or die "Failed at opening $data2!\n"; my $data3 = '/DATA/VariantOutput.txt'; open OUTFILE, '>', $data3 or die "Failed at opening $data3!\n"; print OUTFILE "This is [GI: $humangi] and [Accession: $accession]\nVAR +IANT\t\tPOTENTIAL\t\tPD\n"; while ( <INFILE2>){ # Grab a variant from the file (in this example: P82L) my $line2 = $_; chomp($line2); my $Variant = $line2; # Split the variant into three parts my ($source, $position, $sink) = split(/(\d+)(\w)/, $Variant); #print "$source\t$position\t$sink\n"; # Check whether HS has the source (i.e., P) at the given position (i.e +., 82) my $temp = $info{$humangi}; #print "Temp contains $temp" . "\n"; my @char = split //, $temp; #print "Now \@char contains: @char"; #print "Inside the temp: $char[0] and $char[1]\n"; my $target = $char[$position-1]; #print "This is the target: $target" . "\n"; if ( $target eq $source) { print "Yep!\n"; } my @VariantList = (); my @PDList = (); # Scan the rest of the sequences to check what amino acid they have at + the given position for my $gi ( keys %info ) { my $value = $info{$gi}; my @char2 = split //, $value; my $potential = $char2[$position-1]; push (@VariantList, $potential); if ($potential eq $sink){ # Note the cases where we observe th +e sink (i.e., L) at this position my $pd = "$potential" . "{" . "$gi" . "}"; push (@PDList, $pd) #print "A pathogenic deviation has been found at site $pos +ition - from $source to $sink !\n" . " And the corresponding gi for t +his deviation is: $gi\n"; } } print OUTFILE "$Variant\t\t@VariantList\t\t@PDList\n"; } close(INFILE2);

        Instead of -w use warnings. The command line flag applies to any modules you include as well as your own code so you may get warnings from code you essentially have no control over using -w.

        Use the special variable $! in file i/o error handling messages to give a little more information about the nature of the failure.

        You don't need to set arrays or hashes empty when you declare them. They are sold without batteries.

        Don't declare variables in a lump: my ($foo, $baa, $baz). It makes it harder to see where they are declared and precludes providing a usage comment (although that shouldn't often be required).

        It's not clear from the input loop whether you expect more than one gi. Your use of $accession implies that there should only be one (else the accession value isn't related to the gi value even though they come from the same line of data). You still haven't addressed the possibility that you get a well formatted sequence line before you get a gi line. These are related issues.

        Use statement modifiers where you have a trivial statement controlled by a condition. For example $gi = $1 if defined $1;.

        Avoid a proliferation of temporary variables and assignments. For example, see the change I made to your second while loop.

        Avoid comments that say the same thing as the code. 'split into three parts' adds no extra information that a trivial inspection of the code doesn't tell you. The data format related comments ('... P82L ...') however are good!

        You can use {} to ensure a variable is interpolated correctly in a string. For example you can write "${potential}{$gi}" instead of concatenating a bunch of substrings together.

        My reworked version with the changes implied is:

        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $data = '/DATA/proteinfile.txt'; open my $inFile, '<', $data or die "Failed at opening $data: $!\n"; # Populate the info hash with GIs as keys and sequences as values my $humanGi; my $accession; my $gi; # Current gi while reading sequences my %info; while (<$inFile>) { my $line = $_; chomp $line; last if m!END!; if ($line =~ m/HUMAN/) { ($humanGi, $accession) = $line =~ m/^\S+\|(\d+)\|\w+\|(\S{6}?) +/; } if ($line =~ m/^\S+\|(\d+)/) { $gi = $1 if defined $1; } else { $info{$gi} = $line; } } close $inFile; my $data2 = '/DATA/variantlist.txt'; open $inFile, '<', $data2 or die "Failed at opening $data2: $!\n"; my $data3 = '/DATA/VariantOutput.txt'; open my $outfile, '>', $data3 or die "Failed at opening $data3: $!\n"; print $outFile "This is [GI: $humanGi] and [Accession: $accession]\nVARIANT\t\tPO +TENTIAL\t\tPD\n"; while (defined (my $Variant = <$inFile>)) { # Grab a variant from the file (in this example: P82L) chomp $Variant; my ($source, $position, $sink) = split /(\d+)(\w)/, $Variant; # Check whether HS has the source (i.e., P) at the given position +(i.e., 82) my @char = split //, $info{$humanGi}; my $target = $char[$position - 1]; my @VariantList; my @PDList; # Scan the rest of the sequences to check what amino acid they hav +e at # the given position for my $gi (keys %info) { my @char2 = split //, $info{$gi}; my $potential = $char2[$position - 1]; push @VariantList, $potential; if ($potential eq $sink) { # Note the cases where we observe the sink (i.e., L) at th +is position push @PDList, "${potential}{$gi}"; } } print $outFile "$Variant\t\t@VariantList\t\t@PDList\n"; } close $inFile; close $outFile;

        This is still rather unsatisfactory code because there are many ways it can fail due to unexpected data. Sanity checking helps ensure the code and the data format conform to the same expectations and make it much easier to diagnose problems when expectations aren't met.

        There are no obvious inefficiencies in the current code. Initially efficiency shouldn't be an major consideration in any case. Generally if you avoid nested loops to the extent reasonable and avoid re-reading input files code of this sort will perform well enough for most purposes.

        True laziness is hard work