in reply to Re^6: compare two files on the basis of Two IDs
in thread compare two files on the basis of Two IDs
#!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $file1 = <<END; chr7 151046672 chr7 151047369 chr3 127680920 chr3 127680920 END my $file2 = <<END; chr1 66953622 66953654 chr1 67200451 67200472 chr1 67200475 67200478 chr1 67058869 67058880 chr1 67058881 67058885 chr7 151046672 127680920 chr7 151047369 127680920 chr3 127680920 151046672 chr3 127680920 151047369 END open my $infile1, '<', \$file1 or die "unable to open first file $!"; open my $infile2, '<', \$file2 or die "unable to open 2nd file $!"; ### create memory structure of file 1: ### so that we only have to read file1 once! # my %file1_hash; while (my $line = <$infile1>) { next if $line =~ /^\s*$/; #skip blank lines (a common infile goof +) my ($key, $value) = split /\s+/, $line; # use better "names" I have # no idea of what a chr col + means $file1_hash{"$key:$value"} = 1; } close $infile1; # file handle closure is optional, but I'd do it. ### process each line in file2: ### If a line "matches" with any line in file1, then "E", else "M" ### I don't know that these numbers mean, come up with better comment +. while (my $line = <$infile2>) { chomp $line; #so that output with E or M can be on same line next if $line =~ /^\s*$/; #skip blank lines (a common infile goof +) my ($chr, $val1, $val2) = split /\s+/,$line; if ( exists $file1_hash{"$chr:$val1"} or exists $file1_hash{"$chr:$val2"} ) { print "$line\tE\n"; # match exists with file 1 } else { print "$line\tM\n"; # match does NOT exist with file 1 } } __END__ Prints the following: chr1 66953622 66953654 M chr1 67200451 67200472 M chr1 67200475 67200478 M chr1 67058869 67058880 M chr1 67058881 67058885 M chr7 151046672 127680920 E chr7 151047369 127680920 E chr3 127680920 151046672 E chr3 127680920 151047369 E
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^8: compare two files on the basis of Two IDs
by genome (Novice) on Sep 30, 2016 at 02:21 UTC | |
by Marshall (Canon) on Sep 30, 2016 at 02:45 UTC | |
by genome (Novice) on Sep 30, 2016 at 14:29 UTC | |
by genome (Novice) on Oct 07, 2016 at 19:14 UTC | |
by Marshall (Canon) on Oct 09, 2016 at 05:24 UTC |