File 2 can also be represented as a hash structure. Hash keys are the numeric values and the hash'es value is an array of "chr" strings. This allows more than one chrX value to be associated with a single numeric values. Not sure if that is needed or not, but this code allows that possibility.
#!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $file1 = <<END; chr7 151046672 chr7 151047369 chr3 127680920 chr3 127680920 END my $file2 = <<END; chr1 66953622 66953654 chr1 67200451 67200472 chr1 67200475 67200478 chr1 67058869 67058880 chr1 67058881 67058885 chr7 151046672 127680920 chr7 151047369 127680920 chr3 127680920 151046672 chr3 127680920 151047369 END open my $infile1, '<', \$file1 or die "unable to open first file $!"; open my $infile2, '<', \$file2 or die "unable to open 2nd file $!"; ### create memory structure of file 2: ### so that we only have to read file2 once! # my %file2_hash; while (my $line = <$infile2>) { next if $line =~ /^\s*$/; #skip blank lines (a common infile goof +) my ($chr, $value1,$value2) = split /\s+/, $line; # use better "name +s" I have # no idea of what a chr co +l means push @{$file2_hash{$value1}},$chr; push @{$file2_hash{$value2}},$chr; } close $infile2; # file handle closure is optional, but I'd do it. ### process each line in file1: ### If a line "matches" with any line in file2, then "E", else "M" ### I don't know that these numbers mean, come up with better comment +. while (my $line = <$infile1>) { chomp $line; #so that output with E or M can be on same line next if $line =~ /^\s*$/; #skip blank lines (a common infile goof +) my ($chr, $val1) = split /\s+/,$line; if ( grep{$chr}@{$file2_hash{$val1}} ) { print "$line\tE\n"; # match exists with file 2 } else { print "$line\tM\n"; # match does NOT exist with file 2 } } __END__ Prints the following: chr7 151046672 E chr7 151047369 E chr3 127680920 E chr3 127680920 E
In reply to Re^9: compare two files on the basis of Two IDs
by Marshall
in thread compare two files on the basis of Two IDs
by genome
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |