in reply to best way to use grep
print LOG "\$xrefvalue is $xrefvalue \n";
$x=grep /$xrefvalue/, @xreflines;
I don't see where $xrefvalue is being declared and initialized. I assume it's being built from $DISTID1 and $CUST1 in something like the way that Laurent_R shows here:
my $xrefvalue = "$DISTID1$CUST1";
If so, there's a potential problem because two records like
3696693;5308;;BJS BREWHOUSE;2631 EDMONDSON RD;...
369669;35308;;HORSESHOE ROAD INN;12 3RD ST;...
will have the same $xrefvalue value, "36966935308", unless there is some unstated rule that tells you this can never happen.
Better, IMHO, to use a non-numeric separator to guarantee unambiguous cross-ref values:
my $separator = ';';
...
my $xrefvalue = "$DISTID1$separator$CUST1";
A semicolon seems nice because the CSV field separator is already a semicolon.
The advice to build a cross-ref lookup hash seems very, very good. I imagine the rest of the code might look a lot like the code in Laurent_R's post except the split statement could be
my ($dist, $cust) = (split $separator, $line)[0,1];
or
my ($dist, $cust) = split $separator, $line, 3;
You don't say how big your database is, but a hash could accommodate tens of millions of cross references in system memory for very fast lookup; much more than that and you're looking at a database. (Other approaches, like using Text::CSV_XS or a regex field extractor, or perhaps emulated multidimensional hash keys might be better (or sexier), but let's just take one step at a time!)
Give a man a fish: <%-{-{-{-<
|
|---|