sophix has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, Last time when I had an error of uninitialized value, I was able to fix it by simply initializing the variable. In this case, I could not figure our what is particularly wrong with my use of string comparison. I would appreciate any comment. ps. I get the errors where I use 'eq' operator.
#!/usr/bin/perl -w use strict; use Data::Dumper; my ( $data, $line, $gi, $humangi, %info); $data = '/DATA/proteinfile.txt'; open INFILE, '<', $data or die "Failed at opening $data!\n"; # Construct the hash with GIs as keys and sequences as values while ( <INFILE> ) { $line = $_; chomp($line); last if m!END!; if($line=~m/HUMAN/){ ($humangi) = ($line=~m/^\S+\|(\d+)/) } if($line=~m/^\S+\|(\d+)/) { if(defined($1)) { $gi=$1; } } else { $info{$gi}=$line; } } #print Dumper (\%info); print "$humangi\n"; close(INFILE); my $data2 = '/DATA/variantlist.txt'; open INFILE2, '<', $data2 or die "Failed at opening $data2!\n"; my $data3 = '/DATA/VariantOutput.txt'; open OUTFILE, '>', $data3 or die "Failed at opening $data3!\n"; while ( <INFILE2>){ # Grab a variant from the file (in this example: P82L) my $Variant = $_; # Split the variant into three parts my ($source, $position, $sink) = split(/(\d+)(\w)/, $Variant); print "$source , $ position , $sink\n"; # Check whether HS has the source (i.e., P) at the given position (i.e +., 82) my @temp = $info{$humangi}; if ( $temp[$position] eq $source) { print "Yep, $source has been confirmed!\n"; } else { print "There is something wrong!\n"; } # Scan the rest of the sequences to check what amino acid they have at + the given position for my $gi ( keys %info ) { my @value = $info{$gi}; my @VariantList = (); push ( @VariantList, $value[$position]); if ($value[$position] eq $sink){ # Note the cases where we obs +erve the sink (i.e., L) at this position print OUTFILE "A pathogenic deviation has been found at si +te $position - from $source to $sink !\n" . " And the corresponding g +i for this deviation is: $gi\n"; } } #print OUTFILE "Variant list contains: @VariantList\n"; } close(INFILE2);

Replies are listed 'Best First'.
Re: Use of uninitialized value in string eq
by ikegami (Patriarch) on Apr 21, 2010 at 20:41 UTC
    my @temp = $info{$humangi};
    assigns one value to @temp, so you'll have a problem if $position is anything other than 0.

    By the way, the following very suspicious:

    my ($source, $position, $sink) = split(/(\d+)(\w)/, $Variant);

    Specifically, /(\d+)(\w)/ doesn't look like it would match a separator. It should probably be

    my ($source, $position, $sink) = $Variant =~ /(.*?)(\d+)(\w)/;
      Hi ikegami, thanks for the reply.

      I do not see why there would be a problem with the assignment of one value to @temp. My intention was to put the protein sequence (of human gi) into an array and then to point a position in this sequence. I thought I would bump into the problem you mentioned in case I would have used $temp, and actually that's why I used an array there. I would be glad if you can tell me what might be the problem.

      Suspicious, indeed. Yet it is working properly. I first used (\w)(\d+)(\w) and then modified it as to have the current one following the errors I got. I guess it first makes a split as P and 82L and then makes a second split by separating 82 and L. Your alternative looks more appropriate to use though.

      Any idea concerning the use of 'eq' operator?

        Any idea concerning the use of 'eq' operator?

        ikegami has already given you the answer to that. Your line

        my @temp = $info{$humangi};

        assigns a single scalar value to element zero only of the @temp array and no other element is initialised, but your line

        my ($source, $position, $sink) = split(/(\d+)(\w)/, $Variant);

        assigns the value of 82 to $position using your example of "P82L" in $Variant. Your comparison

        if ( $temp[$position] eq $source) {

        attempts to access element 82 of @temp which does not exist, hence the warning.

        I hope this clarifies things for you.

        Cheers,

        JohnGG

        I do not see why there would be a problem with the assignment of one value to @temp.

        What do you think $temp[82] returns when @temp only has one value? undef.

        I guess it first makes a split as P and 82L

        split is designed to split a separated list, but there's no separator between the fields in "P82L".

        Yet it is working properly.

        In the same sense that towing your car to work is a working alternative to driving to work.

Re: Use of uninitialized value in string eq
by GrandFather (Saint) on Apr 22, 2010 at 01:49 UTC
    Last time when I had an error of uninitialized value, I was able to fix it by simply initializing the variable.

    Don't do that! Or at least, don't do that unless you understand why the variable was uninitialised and providing a default it the correct solution. Just initialising a variable to mask a warning is not fixing the bug in your code.

    In a similar way, declaring all your variables up front in a block just to satisfy strict negates the virtue of using strict. In your sample code for example you use $line in a loop, but declare it outside the loop - that's bad. You declare $data then initialise it in the following line. Don't! Declare it where you initialise it.

    $gi highlights the problem. It is declared up from with everything else so that strips any meaning that could be inferred from its scope. However the way it is used implies that $gi retains state across iterations of the while loop, but no check is made in the loop to see that $gi has a valid value in the $info{$gi} = $line assignment. An undef check before the assignment and die indicating a badly formatted file may save a lot of grief at some point. BTW, should that assignment perhaps be a push: @$info{$gi}, $line and the array assignments later should then be @temp = @$info{$humangi} and @value = @$info{$gi}?

    I notice too that you don't chomp lines in the INFILE2 loop. Is that by design?

    True laziness is hard work
      Thank you, GrandFather. Particularly I appreciate your advise regarding the initialization of variables. Plus, I have not heard the hash slice before, that was very helpful too!

      chomp @INFILE2 loop. You are right, I should have chomped the lines. Because I get that file by using another script where I add \n at the end. Thanks for pointing it out.

      I revised the code in line with your suggestions, and get the following errors:

      Possible unintended interpolation of @VariantList in string at PD.pl line 91.

      I have no idea why I am getting this error. I thought I can safely use an array to print it out in this way.

      Type of arg 1 to push must be array (not hash slice) at PD.pl line 33, near "$line;"

      This is weird?!

      Global symbol "@VariantList" requires explicit package name at PD.pl l +ine 91. Execution of PD.pl aborted due to compilation errors.

      And here is the code. #!/usr/bin/perl -w use strict; use Data::Dumper; my %info = (); my $humangi; my $data = '/DATA/proteinfile.txt'; open INFILE, '<', $data or die "Failed at opening $data!\n"; # Construct the hash with GIs as keys and sequences as values while ( <INFILE> ) { my $line = $_; chomp($line); last if m!END!; if($line=~m/HUMAN/){ ($humangi) = ($line=~m/^\S+\|(\d+)/) } if($line=~m/^\S+\|(\d+)/) { if(defined($1)) { my $gi=$1; } } else { if (defined(my $gi)) { push (@info{$gi}, $line); } else { die "Badly formatted file. Failed at reading the GI!\n"; } } } #print Dumper (\%info); print "$humangi\n"; close(INFILE); my $data2 = '/DATA/variantlist.txt'; open INFILE2, '<', $data2 or die "Failed at opening $data2!\n"; my $data3 = '/DATA/VariantOutput.txt'; open OUTFILE, '>', $data3 or die "Failed at opening $data3!\n"; while ( <INFILE2>){ # Grab a variant from the file (in this example: P82L) my $line2 = $_; chomp($line2); my $Variant = $line2; # Split the variant into three parts my ($source, $position, $sink) = split(/(\d+)(\w)/, $Variant); print "$source , $ position , $sink\n"; # Check whether HS has the source (i.e., P) at the given position (i.e +., 82) my @temp = @info{$humangi}; if ( $temp[$position] eq $source) { print "Yep, $source has been confirmed!\n"; } else { print "There is something wrong!\n"; } # Scan the rest of the sequences to check what amino acid they have at + the given position for my $gi ( keys %info ) { my @value = @info{$gi}; my @VariantList = (); push ( @VariantList, $value[$position]); if ($value[$position] eq $sink){ # Note the cases where we obs +erve the sink (i.e., L) at this position print OUTFILE "A pathogenic deviation has been found at si +te $position - from $source to $sink !\n" . " And the corresponding g +i for this deviation is: $gi\n"; } } print OUTFILE "Variant list contains: @VariantList\n"; } close(INFILE2);

        Oops, my bad. I should have used the hash slice correctly.

        Changing  push (@info{$gi}, $line) to  push (@{$info{$gi}} solved the problem.

        So I am left with the error involving @VariantList. Probably I am making another fundamental mistake when assigning, e.g., trying to assign an array into a scalar. I am looking into it.