Again you have a poor set of test data as when printing file1, all of them are E's now.

File 2 can also be represented as a hash structure. Hash keys are the numeric values and the hash'es value is an array of "chr" strings. This allows more than one chrX value to be associated with a single numeric values. Not sure if that is needed or not, but this code allows that possibility.

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $file1 = <<END; chr7 151046672 chr7 151047369 chr3 127680920 chr3 127680920 END my $file2 = <<END; chr1 66953622 66953654 chr1 67200451 67200472 chr1 67200475 67200478 chr1 67058869 67058880 chr1 67058881 67058885 chr7 151046672 127680920 chr7 151047369 127680920 chr3 127680920 151046672 chr3 127680920 151047369 END open my $infile1, '<', \$file1 or die "unable to open first file $!"; open my $infile2, '<', \$file2 or die "unable to open 2nd file $!"; ### create memory structure of file 2: ### so that we only have to read file2 once! # my %file2_hash; while (my $line = <$infile2>) { next if $line =~ /^\s*$/; #skip blank lines (a common infile goof +) my ($chr, $value1,$value2) = split /\s+/, $line; # use better "name +s" I have # no idea of what a chr co +l means push @{$file2_hash{$value1}},$chr; push @{$file2_hash{$value2}},$chr; } close $infile2; # file handle closure is optional, but I'd do it. ### process each line in file1: ### If a line "matches" with any line in file2, then "E", else "M" ### I don't know that these numbers mean, come up with better comment +. while (my $line = <$infile1>) { chomp $line; #so that output with E or M can be on same line next if $line =~ /^\s*$/; #skip blank lines (a common infile goof +) my ($chr, $val1) = split /\s+/,$line; if ( grep{$chr}@{$file2_hash{$val1}} ) { print "$line\tE\n"; # match exists with file 2 } else { print "$line\tM\n"; # match does NOT exist with file 2 } } __END__ Prints the following: chr7 151046672 E chr7 151047369 E chr3 127680920 E chr3 127680920 E

In reply to Re^9: compare two files on the basis of Two IDs by Marshall
in thread compare two files on the basis of Two IDs by genome

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.