in reply to Re^2: General program and related problems
in thread General program and related problems

Thanks every body for the help Basically my output file 1 at the moment is a file with 1 column with rs values like


rs3547689

rs325678912

rs36789012

etc

I need now to find these value in file 2 and print out the line or in a separated file
The file 2 looks like
XXX XXX XXX XXX XXX XXX (1050 times)

rs3507865 AA AT AT AT TT AA (1050 values)

rs3456189 GG GC GG CC CC .....

more than 700 rows
Can you gimme a suggestion for keys for the hash? Can be row number even if I can not write on file 2? Cheers again
  • Comment on Re^3: General program and related problems

Replies are listed 'Best First'.
Re^4: General program and related problems
by jethro (Monsignor) on Aug 04, 2009 at 11:53 UTC

    Only 700 (even if long) rows? Then you don't need any disc-based hash. Just create a hash with the rs value (i.e. for example the 'rs3507865') of a row in file2 as key and the position in the file as data. The position in the file you can find with tell() (before reading the line).

    Then just read the numbers in file1 and look up their position in the hash and use seek() in file2 to go there

    open(FILE2,... my $position=tell(FILE2); my %rs; while ($line=<FILE2>) { my ($key)= $line=~/^(rs\d{5,})\b/; if (defined $key) { $rs{$key}= $position; } $position=tell(FILE2); } ... while (defined ($line= <FILE>)) { ... foreach (@output) { if (exists $rs{$_}) { seek(FILE2,$rs{$_},0); my $line= <FILE2>; print FD $line; } }

    UPDATE: Added a '^' to the regex in the script