in reply to Re: Reading two files, cmp certain cols
in thread Reading two files, cmp certain cols
I will use the tag for long codes in future.
The reason that I needed to have all values in the first hash (%file1), is that I want to do a series of calculation. such as:
1- For each key-value in %file1 (each key has multiple values) check and see if the key-value exist in File2 with two conditions Conting_id 2 ==1 and Contig_id 3 >=3. This is the thing that we are doing in second while loop. These are true positives.
2- Now I want to calculate False Positives which is a little bit trickier. If I have 512 common Current_line[0] between file1 and file2. How many mistakenly in %File1 have positively identified. That is they are in the areas that either current_line2 != 1 or current_line3 is <= 3.
3- Now False Negatives, how many of current_line[0] and current_line1 (%file1) from file1 have not identified while in the file 2 they have current_line2 == 1 and current_line3 is >= 3.
4- Also, true negatives. How many of current_line2 == 0 and current_line3 is <= 3 have not truly identified by %file1. I came of the following code that needs to be corrected.
#This part was revised by you but this is the old version of mine my %file1=(); while(<INPUT1>){ chomp; (my $id, my $number) = split("\t", $_); if ($id=~ m/^(CLS_S3_Contig[0-9]+)([-]?)([0-9]+)([_]?)([0-9] ++)$/i) { my $matched_id=$id; # breaks the CLS_Contig1000_200-202 +to its componenents for (my $i=$3-8;$i<$5+8;$i++){ print join ("\t", $1, $i), "\n"; push (@{$file1{$1}}, $i); } } } close(INPUT1); ################################# THIS IS YOUR MODIFIED YOUR VERSION my %file2 =(); my @true_positives = (); while(<INPUT2>){ chomp; my @current_line = split /\t/; if (exists $file1{$current_line[1]} ) { my $key = $current_line[1]; foreach my $position1 (@{$file1{$key}}){ if ( $current_line[0] eq $key && $current_line[1] == $position1 && $current_line[2] == 1 && $current_line[3] >= 3) { print join ("\t", @current_line[0..3], "***", $key, $position1), " +\n"; push (@true_positives, $current_line[1]); # I made this up t +o count the number of true positives but it does not consider duplica +tes push (@{$file2{$current_line[0]}}, $current_line[2]); }#end inner if }#end foreach } #end if } #end while ############################################# IDENTIFY COMMON ELEMENTS ############################################# my @common =(); my $common_element =""; foreach (keys %file1) { push(@common, $_) if exists $file2{$_}; } ############################################# IDENTIFY NOT COMMON ELEMENTS ############################################# my @not_common =(); foreach (keys %file1){ push(@not_common, $_) unless exists $file2{$_}; } ############ making calculations########################## my $found_true_markers =""; my $found_false_positives = ""; $found_true_markers = scalar @true_positives; $found_false_positives = $comnon_element_numbers - $found_true_markers + ; my $truepositive = sprintf ("%.2f", $found_true_markers/$comnon_elemen +t_numbers*100); my $false_positive_rate = sprintf ("%.2f", $found_false_positives/$co +mnon_element_numbers*100); print "$truepositive \% is the rate of true positives\n"; print "$false_positive_rate \% is the rate of fales positives\n"; #################################################### I AM STUCK AT THIS POINT - SOMETIMES IT CALCULATES NEGATIVE RATES OR MORE THAN 100%
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Reading two files, cmp certain cols
by FunkyMonk (Bishop) on Sep 19, 2008 at 20:46 UTC | |
by sesemin (Beadle) on Sep 20, 2008 at 05:43 UTC |