Re^2: Reading two files, cmp certain cols

Thank you very much Cirstoforo for the lessons.

I will use the tag for long codes in future.

The reason that I needed to have all values in the first hash (%file1), is that I want to do a series of calculation. such as:

1- For each key-value in %file1 (each key has multiple values) check and see if the key-value exist in File2 with two conditions Conting_id 2 ==1 and Contig_id 3 >=3. This is the thing that we are doing in second while loop. These are true positives.

2- Now I want to calculate False Positives which is a little bit trickier. If I have 512 common Current_line[0] between file1 and file2. How many mistakenly in %File1 have positively identified. That is they are in the areas that either current_line2 != 1 or current_line3 is <= 3.

3- Now False Negatives, how many of current_line[0] and current_line1 (%file1) from file1 have not identified while in the file 2 they have current_line2 == 1 and current_line3 is >= 3.

4- Also, true negatives. How many of current_line2 == 0 and current_line3 is <= 3 have not truly identified by %file1. I came of the following code that needs to be corrected.

#This part was revised by you but this is the old version of  mine
my %file1=();
while(<INPUT1>){
         chomp;
         (my $id, my $number) = split("\t", $_);

          if ($id=~ m/^(CLS_S3_Contig[0-9]+)([-]?)([0-9]+)([_]?)([0-9]
++)$/i) {

              my $matched_id=$id; # breaks the CLS_Contig1000_200-202 
+to its componenents
                  for (my $i=$3-8;$i<$5+8;$i++){
              print join ("\t", $1, $i), "\n";
              push (@{$file1{$1}}, $i);

       }
   }
}
        
close(INPUT1);

#################################
THIS IS YOUR MODIFIED YOUR VERSION 

my %file2 =(); my @true_positives = ();
while(<INPUT2>){
         chomp;
          my @current_line  = split /\t/;
          if (exists $file1{$current_line[1]} ) {
               my $key = $current_line[1];
                  foreach my $position1 (@{$file1{$key}}){
          if (   $current_line[0] eq $key 
                 && $current_line[1] == $position1
                 && $current_line[2] == 1
                 && $current_line[3] >= 3)
              {
    print join ("\t", @current_line[0..3], "***", $key, $position1), "
+\n";

          push (@true_positives, $current_line[1]); # I made this up t
+o count the number of true positives but it does not consider duplica
+tes
          push (@{$file2{$current_line[0]}}, $current_line[2]);

                }#end inner if
           }#end foreach
     } #end if
} #end while
        

#############################################
IDENTIFY COMMON ELEMENTS
#############################################
my @common =(); my $common_element ="";
  foreach (keys %file1) {
         push(@common, $_) if exists $file2{$_};
  }

#############################################
IDENTIFY NOT COMMON ELEMENTS
#############################################
my @not_common =();
foreach (keys %file1){
        push(@not_common, $_) unless exists $file2{$_};
}
############ making calculations##########################

my $found_true_markers =""; my $found_false_positives = "";

$found_true_markers = scalar @true_positives;
$found_false_positives = $comnon_element_numbers - $found_true_markers
+ ;
my $truepositive = sprintf ("%.2f", $found_true_markers/$comnon_elemen
+t_numbers*100);
my $false_positive_rate = sprintf ("%.2f",  $found_false_positives/$co
+mnon_element_numbers*100);
print "$truepositive \% is the rate of true positives\n";
print "$false_positive_rate \% is the rate of fales positives\n";

####################################################
I AM STUCK AT THIS POINT - SOMETIMES IT CALCULATES 
NEGATIVE RATES OR MORE THAN 100%
[download]

Comment on Re^2: Reading two files, cmp certain cols Download Code

Replies are listed 'Best First'.
Re^3: Reading two files, cmp certain cols by FunkyMonk (Bishop) on Sep 19, 2008 at 20:46 UTC
I will use the tag for long codes in future. You do know that you can edit your own nodes, don't you? Just visit the node and edit the contents of the textbox.	[reply]
Re^4: Reading two files, cmp certain cols by sesemin (Beadle) on Sep 20, 2008 at 05:43 UTC
Thanks for the tip. I thought I have used. Seems not had gone through.	[reply]