in reply to Comparing two hashes-help
Can't verify your regexes without sample data, given the files you dealing with I would have thought that the files might have been formatted the same.Your for loop when reading your ref file would make lots of extraneous entries. You are building your %ref hash and then every after every addition to it you dump out the entire contents, so your ref file would grow factorially! And unless you actually want that behavior, I see no reason here why you'd bother with the %ref at all.You appear to just want to update the genotypes located in the first file into your second file. So just put the positions and genotype of the first file into a hash. Then read the ref and see if a located position in the ref file has a value in the genotype hash, if yes then print that other wise print the ref valuesHowever you have not mentioned what you want if a position and genotype exists in your first file that is not in your reference. Perhaps that can never happen? Pending on your dataset size might just for my $file (@files) {open (GENOTYPE, $file); blah blah}my %genotype; while(<GENOTYPE>) { $genotype{$1} = $2 if (/(\d+)\t\w\t(\w)/); } while (<REF>) { next unless (/(\d+)\t(\w)/); print OUT (defined($genotype{$1}) and ($genotype ne '')) ? "$1\t$g +enotype{$1}\n" : "$1\t$2\n"; }
|
|---|