in reply to Re^3: Searching Huge files
in thread Searching Huge files
open my $snpIn, '<', \$snpFile uses a variable as though it were a file. It's a useful trick for test code because you don't need a separate file. Simply replace the \$snpFile bit with the file name you would normally use with the open.
The code that populates the hash uses a couple of tricks so that it is compact. Expanded it might look like:
while (<$snpIn>) { next unless /^(\w+)\s+(\d+\.\d+)/; $snpLookup{$1} = $2; }
Note that the original code used while as a statement modifier and the code above uses unless as a statement modifier. Note too that the value given on the input line is the value associated with the key (the first 'word' on the line). You could instead assign $. which would give the line number, or you could ++$snpLookup{$1} instead which would give a count of the number of entries for that 'word' in the file.
In like fashion the search loop can be expanded:
while (<$mapIn>) { next unless /^(\w+)\s+(\w+)/ and exists $snpLookup{$1}; print "$1 $2\n"; }
The important test is exists $snpLookup{$1} which tests to see if the first 'word' on the line was also a first 'word' in the first file using exists. The test is only made if the regular expression succeeds. Using the regular expression in that way avoids possible nastiness at the end of the file and maybe where the file format is not as you expect. See perlretut and perlre for more about regular expressions.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Searching Huge files
by biomonk (Acolyte) on Jul 08, 2008 at 13:06 UTC | |
|
Re^5: Searching Huge files
by biomonk (Acolyte) on Jul 09, 2008 at 20:36 UTC | |
by GrandFather (Saint) on Jul 09, 2008 at 21:27 UTC | |
by biomonk (Acolyte) on Jul 11, 2008 at 19:50 UTC | |
by GrandFather (Saint) on Jul 12, 2008 at 00:47 UTC | |
by biomonk (Acolyte) on Jul 13, 2008 at 13:50 UTC | |
|