I'd have to hear more about your data structure to know the best way to optimize things but here are some possibilities based on my secret-decoder-ring knowledge of
DBD::CSV (I am its maintainter and although what's below is in the docs, it may not be evident):
- If you have enough memory to hold two of the tables in memory at once, use DBD::CSV's explicit joins to search the tables. The joins are based on hashes so using them will gain some of the speed mentioned in other hashed-based responses to your question. Something like SELECT baz.foo,qux.bar FROM baz NATURAL INNER JOIN qux ... will automatically create hashes of the two tables and avoid multiple lookups.
- Or, if you are going to be searching on only a single unique id field, try using DBI::SQL::Nano as DBD::CSV's SQL engine. Nano comes with recent versions of DBI and, although much more limited in the SQL it supports, can be *much* faster for single-field queries. See the Nano docs for how to use it with DBD::CSV.
- As others have pointed out, you definitely want to avoid looping through the files more than once. That can be accomplished with DBD::CSV even without either of the two tricks above. To know what to recommend though, I'd need to see your actual table structure and have a better idea of what you're trying to do (for example, does the list of clients come from a query on the first table, or do you have the list of clients in advance?). Feel free to post more details if you want more specific suggestions.
- Even if you go with a hand-rolled hash solution, you may want to use DBD::CSV to build the hashes using DBI's selectall_hashref($x,$y,$z) or selectall_arrayref($x,$y,Slice=>{}) methods.