Re^7: compare two files on the basis of Two IDs

My version of the code works fine. It prints output for each line in file2 (like your original code). If you want output for each line in file1, then the code has to change since file1 and file2 have different formats.

#!/usr/bin/perl
use warnings; 
use strict;
use Data::Dumper;

my $file1 = <<END;
chr7    151046672
chr7    151047369
chr3    127680920
chr3    127680920
END

my $file2 = <<END;
chr1    66953622    66953654
chr1    67200451    67200472
chr1    67200475    67200478
chr1    67058869    67058880
chr1    67058881    67058885
chr7    151046672    127680920
chr7    151047369    127680920
chr3    127680920    151046672
chr3    127680920    151047369
END

open my $infile1, '<', \$file1 or die "unable to open first file $!";
open my $infile2, '<', \$file2 or die "unable to open 2nd file $!";

### create memory structure of file 1:
### so that we only have to read file1 once!
#

my %file1_hash;

while (my $line = <$infile1>)
{
   next if $line =~ /^\s*$/;   #skip blank lines (a common infile goof
+)
   
   my ($key, $value) = split /\s+/, $line; # use better "names" I have
                                           # no idea of what a chr col
+ means
   $file1_hash{"$key:$value"} = 1;
}
close $infile1;  # file handle closure is optional, but I'd do it.

###  process each line in file2:
###  If a line "matches" with any line in file1, then "E", else "M"
###  I don't know that these numbers mean, come up with better comment
+.

while (my $line = <$infile2>)
{
   chomp $line;  #so that output with E or M can be on same line
   next if $line =~ /^\s*$/;   #skip blank lines (a common infile goof
+)
   
   my ($chr, $val1, $val2) = split /\s+/,$line;
   
   if ( exists $file1_hash{"$chr:$val1"} or
        exists $file1_hash{"$chr:$val2"} )
   {
      print "$line\tE\n";  # match exists with file 1
   }
   else
   {
      print "$line\tM\n";  # match does NOT exist with file 1
   }
}


__END__
Prints the following:
chr1    66953622    66953654    M
chr1    67200451    67200472    M
chr1    67200475    67200478    M
chr1    67058869    67058880    M
chr1    67058881    67058885    M
chr7    151046672    127680920    E
chr7    151047369    127680920    E
chr3    127680920    151046672    E
chr3    127680920    151047369    E
[download]

Comment on Re^7: compare two files on the basis of Two IDs Download Code

Replies are listed 'Best First'.
Re^8: compare two files on the basis of Two IDs by genome (Novice) on Sep 30, 2016 at 02:21 UTC
Thanks for your mail. Ye, Its working fine. I am trying to print the output with FILE 1. But not sure in the code change. I am thinking to change, but not succeeded. if you can help me, how we can make a Hash of array of file 2. ?	[reply]
Re^9: compare two files on the basis of Two IDs by Marshall (Canon) on Sep 30, 2016 at 02:45 UTC
Again you have a poor set of test data as when printing file1, all of them are E's now. File 2 can also be represented as a hash structure. Hash keys are the numeric values and the hash'es value is an array of "chr" strings. This allows more than one chrX value to be associated with a single numeric values. Not sure if that is needed or not, but this code allows that possibility. #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $file1 = <<END; chr7 151046672 chr7 151047369 chr3 127680920 chr3 127680920 END my $file2 = <<END; chr1 66953622 66953654 chr1 67200451 67200472 chr1 67200475 67200478 chr1 67058869 67058880 chr1 67058881 67058885 chr7 151046672 127680920 chr7 151047369 127680920 chr3 127680920 151046672 chr3 127680920 151047369 END open my $infile1, '<', \$file1 or die "unable to open first file $!"; open my $infile2, '<', \$file2 or die "unable to open 2nd file $!"; ### create memory structure of file 2: ### so that we only have to read file2 once! # my %file2_hash; while (my $line = <$infile2>) { next if $line =~ /^\s$/; #skip blank lines (a common infile goof +) my ($chr, $value1,$value2) = split /\s+/, $line; # use better "name +s" I have # no idea of what a chr co +l means push @{$file2_hash{$value1}},$chr; push @{$file2_hash{$value2}},$chr; } close $infile2; # file handle closure is optional, but I'd do it. ### process each line in file1: ### If a line "matches" with any line in file2, then "E", else "M" ### I don't know that these numbers mean, come up with better comment +. while (my $line = <$infile1>) { chomp $line; #so that output with E or M can be on same line next if $line =~ /^\s$/; #skip blank lines (a common infile goof +) my ($chr, $val1) = split /\s+/,$line; if ( grep{$chr}@{$file2_hash{$val1}} ) { print "$line\tE\n"; # match exists with file 2 } else { print "$line\tM\n"; # match does NOT exist with file 2 } } __END__ Prints the following: chr7 151046672 E chr7 151047369 E chr3 127680920 E chr3 127680920 E [download]	[reply] [d/l]
Re^10: compare two files on the basis of Two IDs by genome (Novice) on Sep 30, 2016 at 14:29 UTC
Hi, Thanks for your help and support. I made it now. YOu are totally awesome.	[reply]
Re^10: compare two files on the basis of Two IDs by genome (Novice) on Oct 07, 2016 at 19:14 UTC
Hi, Again, `if ( grep{$chr}@{$file2_hash{$val1}} )` is good to directly grab the entries. But suppose, if we want to compare (i.e. ==, or >= or <=) the values.. How the code will change then.. suppose ... `if ( grep{$chr}@{$file2_hash{$val1}} >= '$val1')` OR `if ({$chr}@{$file2_hash{$val1},$chr} >= "$val1:$chr")` what is your suggestions ? What will be the correct code. I am poor in hash..	[reply] [d/l] [select]
Re^11: compare two files on the basis of Two IDs by Marshall (Canon) on Oct 09, 2016 at 05:24 UTC