http://qs1969.pair.com?node_id=466973

MonkPaul has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

Further to my problems i posted yesterday i cannot seem to get the program to recognise information stored in one file and compare it to another file.

This is some of the code a colleague gave me to get it to recognise an Accession number in one file and compare it to an element in an array, obtained from a second file.

The $ref_filehandle holds the data from the reference file that contains only a list of Accession numbers, i.e.

AB014570.2 AB055861.1 AB067522.1 AB073617.1 AC004166.12 AC004851.2 AC004867.5 AC004883.3 AC005077.5 AC005080.2
Where as the @subjects holds in each element :
gi|14670349|ref|NM_032999.1| Homo sapiens general transcription factor II, i (GTF2I), transcript .... or a derivetive of.

The problem i have is that the hash %refList holds all the accession numbers in KEY = Accession VALUE = Accession, so both accession values are stored in key and value - should be an array i know but i need speed when searching through for each element of @subjects, hence the use of "defined" function.
The only problem is that the " if " statement highlighted always equates to false....WHY?

I have found, using print statements, that the hash does have the right values stored in it, but when searched for $elements[3] element it cant find it, so i tried "! defined" - and guess what, it equates to true. Knowing that the values are stored in the hash i can only assume the problem lies with the if statement.

sub get_Results() { open( NEWACC, ">ref_files/$newRefFile" ) || die "$!"; open( NEWALIGN, ">blast_files/$newBlastFile") || die "$!"; ## put list of reference accessions into a structure so we can parse i +t... my %refList = (); while (<$ref_filehandle>) # used instead of REFFILE { chomp; # removes newline character my $key = $_; $refList{ $key } = "$_"; } #PARSE BLAST RESULT HERE.... #read results file one line at a time my @resultLine = <$blast_filehandle>; # used instead of BLASTFILE my $alignment = {}; # reference to an empty hash my @subjects; # reference to empty array my $current_subject = "front_matter"; $alignment->{$current_subject} = ""; for (my $i = 0 ; $i<scalar @resultLine; $i++) { if ($resultLine[$i] =~ /^>/) { $current_subject = $resultLine[ $i ]; chomp ($current_subject); push (@subjects, $current_subject); $alignment->{$current_subject} = ""; } $alignment->{$current_subject} = $alignment->{$current_subject}.$r +esultLine[ $i ]; } my @elements; foreach my $z (@subjects) { chomp $z; @elements = split('\|', $z); if ( ! defined $elements[ 3 ] ) { print ("Parsing Error<BR>"); print ("Line $z"); } ##### PROBLEM LIES HERE WITH IF STATEMENT if (defined $elements[3] && defined $refList{$elements[3]} ) { print ("Already Present in file"); } else { print "No match in reference file"; print (NEWALIGN $alignment->{$z}."\n"); print (NEWACC $elements[3]."\n"); } } #close files close NEWALIGN; close NEWACC; }

Any help is really appreciated - including any critical appraisal as its not my handy work.

Cheers people
MonkPaul.

Replies are listed 'Best First'.
Re: If statement problem with hash values
by thundergnat (Deacon) on Jun 15, 2005 at 18:40 UTC

    I have a feeling that the values in %refList are not what you expect. I notice that your example data has an underscore but none of the example $ref_filehandle values do. Could it be that you need to strip out the underscore?

    A better check in the if statement may be to see if the hash value exists and then whether it is equal to the array value to avoid autovivifying non matched values.

    I rewrote your snippet to run stand alone locally with some bogus data based on your examples. Does this do what you expect?

    use warnings; use strict; my @resultLine = ( 'gi|14670349|ref|NM_032999.1| Homo sapiens general transcription facto +r II, i', 'one', 'gi|14670348|ref|NM_032998.1| Homo sapiens general transcription facto +r II, ii', 'two', 'gi|14670347|ref|NM_032997.1| Homo sapiens general transcription facto +r II, iii', 'three', 'gi|14670346|ref|NM_032996.1| Homo sapiens general transcription facto +r II, iv', 'four', 'gi|14670345|ref|NM_032995.1| Homo sapiens general transcription facto +r II, v', 'five', 'gi|14670344|ref|NM_032994.1| Homo sapiens general transcription facto +r II, vi', 'six', 'gi|14670343|ref|NM_032993.1| Homo sapiens general transcription facto +r II, vii', 'seven' ); my %refList; my @subjects; my $alignment = {}; while (<DATA>) { chomp; $refList{ $_ } = $_; } my $current_subject; for (@resultLine) { if (/^gi\|/) { $current_subject = $_; chomp $current_subject; push (@subjects, $current_subject); } $alignment->{$current_subject} .= $_; } for my $z (@subjects) { my @elements = split('\|', $z); if ( ! defined $elements[ 3 ] ) { print ("Parsing Error<BR>"); print ("Line $z"); } if (exists $refList{$elements[3]} and $refList{$elements[3]} eq $e +lements[3]) { print ("Already Present in file. - $elements[3]\n"); } else { print "No match in reference file. - "; print ($alignment->{$z}."\n"); print ($elements[3]."\n"); } } __DATA__ AB014570.2 AB055861.1 AB067522.1 AB073617.1 AC004166.12 AC004851.2 AC004867.5 AC004883.3 AC005077.5 AC005080.2 NM_032997.1 NM_032993.1 NM_032999.1
      Sure,

      That seems to be what i need -> if (exists $refList{$elements3} and $refList{$elements3} eq $e +lements3)

      I like the way you got the current_line variable also, very nifty.

      I will let you know when i implement it - i had a beer to relax this aft, sat in the sun and now i cant be bothered to look at it again.
      Thanks
      MonkPaul.

Re: If statement problem with hash values
by Elijah (Hermit) on Jun 15, 2005 at 17:42 UTC
    Couple of things:
    An array reference starts at 0 and your reference object while, it is the third object in the list is going to be at location 2 not 3 in the array. Second you have a redundant check there and the "defined's" can be removed. The entire conditional statement can be simplified to this:
    if ($elements[2]) { print ("Parsing Error: Line $z\n"); if ($refList{$elements[2]}) { print ("Already Present in file\n"); } else { print "No match in reference file\n"; print (NEWALIGN $alignment->{$z}."\n"); print (NEWACC $elements[2]."\n"); } }
Re: If statement problem with hash values
by tlm (Prior) on Jun 15, 2005 at 17:17 UTC

    This is the kind of problem for which I find the debugger invaluable (e.g. perl -d my_script.pl), as a quick tool to test the values of variables. Various hypotheses (such as "maybe there is trailing whitespace in the hash keys") as to the values of variables can be tested quickly with this tool. See perldebug.

    Small point: the lines in @subject are chomped twice. (I don't think this is the cause of the problem.)

    I think the problem is that you need $elements[2], not $elements[3].

    Update: Upon closer inspection, I see an extra | that I missed before, so my last comment above is wrong.

    the lowliest monk