in reply to Re^3: Best way to search file
in thread Best way to search file

Thanks Sundial. A couple of questions.

I was able to get the hash created and working properly. Now I need to take care of some details. Depending on the type of file that I am using, the SSN may or may not have hyphens in it. How would you strip the hyphens while loading the hash? This is what I have now:

while (<$HRDATA>) { my ($ssn,$aoid) = split(/","/)[4,2]; $ssnhash{$ssn} = $aoid; }

Basic I am sure but I am just learning.

Secondly, again, depending on file type, the SSN may be in field 2 or 4 of file 2. One file type, where the SSN is in field 2 has a file header at the top. The only way that I can see to programatically know which is which it to query the file line of the file. Once I know that I can tweak my code to load the SSN in the hash from the proper fields. Does that make sense? Any thoughts on a better way?

Thanks!!

Replies are listed 'Best First'.
Re^5: Best way to search file
by Marshall (Canon) on Apr 16, 2015 at 21:02 UTC
    One way to strip out the "-" characters is like this:
    #!usr/bin/perl use strict; use warnings; foreach my $ssn qw(123-45-6789 987654321) { my $digits = $ssn; $digits =~ s/-//g; print "$ssn \t$digits\n"; } __END__ prints: 123-45-6789 123456789 987654321 987654321
    I am not sure of the best way to handle this "sometimes field 2 vs 4" without seeing a few example lines of these databases. Don't post any real SSNs!

    As mentioned before, your HUGE performance gain will come by processing each of the 2 files only once. Process file 2 first to make a memory structure, then process file 1 line by line. Each file only should be read once.