Re^4: Best way to search file

Thanks Sundial. A couple of questions.

I was able to get the hash created and working properly. Now I need to take care of some details. Depending on the type of file that I am using, the SSN may or may not have hyphens in it. How would you strip the hyphens while loading the hash? This is what I have now:

while (<$HRDATA>) {
my ($ssn,$aoid) = split(/","/)[4,2];
$ssnhash{$ssn} = $aoid;
}
[download]

Basic I am sure but I am just learning.

Secondly, again, depending on file type, the SSN may be in field 2 or 4 of file 2. One file type, where the SSN is in field 2 has a file header at the top. The only way that I can see to programatically know which is which it to query the file line of the file. Once I know that I can tweak my code to load the SSN in the hash from the proper fields. Does that make sense? Any thoughts on a better way?

Thanks!!

Comment on Re^4: Best way to search file Download Code

Replies are listed 'Best First'.
Re^5: Best way to search file by Marshall (Canon) on Apr 16, 2015 at 21:02 UTC
One way to strip out the "-" characters is like this: `#!usr/bin/perl use strict; use warnings; foreach my $ssn qw(123-45-6789 987654321) { my $digits = $ssn; $digits =~ s/-//g; print "$ssn \t$digits\n"; } __END__ prints: 123-45-6789 123456789 987654321 987654321` [download] I am not sure of the best way to handle this "sometimes field 2 vs 4" without seeing a few example lines of these databases. Don't post any real SSNs! As mentioned before, your HUGE performance gain will come by processing each of the 2 files only once. Process file 2 first to make a memory structure, then process file 1 line by line. Each file only should be read once.	[reply] [d/l]

Replies are listed 'Best First'.

Re^5: Best way to search file
by Marshall (Canon) on Apr 16, 2015 at 21:02 UTC


#!usr/bin/perl
use strict;
use warnings;

foreach my $ssn qw(123-45-6789 
                   987654321)
{
   my $digits = $ssn;
   $digits =~ s/-//g;
   print "$ssn   \t$digits\n";
}
__END__
prints:
123-45-6789     123456789
987654321       987654321
[download]

As mentioned before, your HUGE performance gain will come by processing each of the 2 files only once. Process file 2 first to make a memory structure, then process file 1 line by line. Each file only should be read once.

[reply]
[d/l]