in reply to Comparing multiple entries from two files

Try something like this...
#!/usr/bin/perl use strict; my $data_1 = <<EOF; ID Catg_ID Pos 12 A 16 15 B 5 16 A 175 EOF my %record; open( my $file_1, "<", \$data_1 ) or die "Cannot open data_1\n"; while (<$file_1>) { next if ( /^ID/ ); s/(\d+)\s//; $record{$1} = [ split ]; } while (<DATA>) { next if ( /^Catg/ ); my @cols = split; for my $i ( keys %record ) { print "$i $cols[0] $record{$i}->[1] $cols[3]\n" if (( $record{$i}->[0] eq $cols[0] ) && ( $record{$i}->[1] > $cols[1] ) && ( $record{$i}->[1] < $cols[2] )) } } __DATA__ Catg_Name Start Stop Name A 8 19 jamm A 110 112 bbc E 170 256 vadd A 14 18 cip

Replies are listed 'Best First'.
Re^2: Comparing multiple entries from two files
by hanger4 (Initiate) on Jul 08, 2009 at 21:21 UTC
    Thank you bichonfrise74. Your suggestion works very well. It is much simpiler and quite a bit faster than what I was using:

    #!/usr/bin/perl use warnings; use strict; my $idfile = 'C:...'; open(ID, $idfile) or die "Cannot open file '$idfile' \n\n"; #Create and open output file my $out = 'C:\Users\Clayton\Documents\Research\GO\Data\chr_8\chr_8_pos +gen'; open (OUT, ">$out"); # Create Variables my $line; my $line2; my %hash = (); # Read in file line by line while ( $line = <ID> ) { chomp $line; # Split the tab delimited file into an array my @arr = split /\t/, $line; #if(defined($arr[2])){ # Put the columns of the array into variables my $id = $arr[0]; my $catg_id = $arr[1]; my $pos = $arr[2]; $hash{$id} = "$catg_id\t$pos"; } close ID; my @k = keys %hash; my $k; my $namefile = 'C:...'; open(NAMEFILE, $namefile) or die "Cannot open file '$namefile' \n\n"; while ( $line2 = <NAMEFILE> ) { chomp $line2; my @loc = split /\t/, $line2; foreach $k (@k) { my @arr2 = split /\t/, $hash{$k}; if($loc[0] == $arr2[0]){ if(($arr2[1] >= $loc[1]) and ($arr2[1] <= $loc[2])){ print OUT "$arr2[0]\t$k\t$arr2[1]\t$loc[1]\t$loc[2]\t$ +loc[3]\n" }}}}

    I'm still new to perl so I guess a lot of my methods are not the most efficient.

    One more question though... Whenever I use your code or mine I get an error "Use of uninitialized value in numeric gt (>)at C:\...line 32, <NAMEFILE> line 1. This error repeats many times for each line in the file. The output is what I expect it to be though and the program runs without any problems (other than generating a bunch of warnings).

      You should eliminate the warnings since they may indicate bugs in your code. I do not know which line is line # 32, but perhaps you could try to determine the cause of the warnings by printing the contents of your @loc and @arr2 arrays. Do they have as many elements as you think they have?
        Yeah, that's the strange thing. I went through the script line by line, printing after every operation, and the output was always what I expected.