in reply to Re^4: compare two files on the basis of Two IDs
in thread compare two files on the basis of Two IDs

Marshall wrote:
"I couldn't see any way to get an "E" with your test data, so I added some extra data to my test cases. In the future, it is best if you can provide an example "desired output" that demo's the basic decisions which need to be made.
show an example output and explain clearly how you arrived at that result.
  • Comment on Re^5: compare two files on the basis of Two IDs

Replies are listed 'Best First'.
Re^6: compare two files on the basis of Two IDs
by genome (Novice) on Sep 29, 2016 at 20:08 UTC
    ok. Please consider again the input files, with both candidates, for E and M as well. File 1
    chr7 151046672 chr7 151047369 chr3 127680920 chr3 127680920
    File 2
    chr1 66953622 66953654 chr1 67200451 67200472 chr1 67200475 67200478 chr1 67058869 67058880 chr1 67058881 67058885 chr7 151046672 127680920 chr7 151047369 127680920 chr3 127680920 151046672 chr3 127680920 151047369
    Code for now.
    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $file1 = $ARGV[0]; open($infile1,$file1); my $file2 = $ARGV[1]; open($infile2,$file2); my %file2_hash; while (my $line = <$infile1>) { chomp $line; #so that output with E or M can be on same line next if $line =~ /^\s*$/; #skip blank lines (a common infile goof +) my ($chr, $val1, $val2) = split /\s+/,$line; } close $infile1; while (my $line = <$infile2>) { chomp $line; next if $line =~ /^\s*$/; #skip blank lines (a common infile goof) my ($key, $value1, $value2) = split /\s+/, $line; # use better "nam +es" I have # no idea of what a chr col $file2_hash{"$key:$value1:$value2"} = 1; # file handle closure is optional, but I'd do it. ### process each line in file2: ### If a line "matches" with any line in file1, then "E", else "M" ### I don't know that these numbers mean, come up with better comment close $infile2; if (exists $file2_hash{"$chr:$val1:$val2"}) { print "$line\tE\n"; # match exists with file 1 } else { print "$line\tM\n"; # match does NOT exist with file 1 } }
    Its not working, Since I want to print the output with respect to my file 1. If you can help with. I Know there is some error in 'If' statement. I could note understand that..
      My version of the code works fine. It prints output for each line in file2 (like your original code). If you want output for each line in file1, then the code has to change since file1 and file2 have different formats.
      #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $file1 = <<END; chr7 151046672 chr7 151047369 chr3 127680920 chr3 127680920 END my $file2 = <<END; chr1 66953622 66953654 chr1 67200451 67200472 chr1 67200475 67200478 chr1 67058869 67058880 chr1 67058881 67058885 chr7 151046672 127680920 chr7 151047369 127680920 chr3 127680920 151046672 chr3 127680920 151047369 END open my $infile1, '<', \$file1 or die "unable to open first file $!"; open my $infile2, '<', \$file2 or die "unable to open 2nd file $!"; ### create memory structure of file 1: ### so that we only have to read file1 once! # my %file1_hash; while (my $line = <$infile1>) { next if $line =~ /^\s*$/; #skip blank lines (a common infile goof +) my ($key, $value) = split /\s+/, $line; # use better "names" I have # no idea of what a chr col + means $file1_hash{"$key:$value"} = 1; } close $infile1; # file handle closure is optional, but I'd do it. ### process each line in file2: ### If a line "matches" with any line in file1, then "E", else "M" ### I don't know that these numbers mean, come up with better comment +. while (my $line = <$infile2>) { chomp $line; #so that output with E or M can be on same line next if $line =~ /^\s*$/; #skip blank lines (a common infile goof +) my ($chr, $val1, $val2) = split /\s+/,$line; if ( exists $file1_hash{"$chr:$val1"} or exists $file1_hash{"$chr:$val2"} ) { print "$line\tE\n"; # match exists with file 1 } else { print "$line\tM\n"; # match does NOT exist with file 1 } } __END__ Prints the following: chr1 66953622 66953654 M chr1 67200451 67200472 M chr1 67200475 67200478 M chr1 67058869 67058880 M chr1 67058881 67058885 M chr7 151046672 127680920 E chr7 151047369 127680920 E chr3 127680920 151046672 E chr3 127680920 151047369 E
        Thanks for your mail. Ye, Its working fine. I am trying to print the output with FILE 1. But not sure in the code change. I am thinking to change, but not succeeded. if you can help me, how we can make a Hash of array of file 2. ?