in reply to Moving from hashing to tie-ing.

FWIW, I think the following approach will help you, while preserving the existing code base (mostly anyway). As I understand, the main bottleneck of the problem is to lookup a record based on its key, in our case it is the customer pin, and do it efficiently.

NOTE The proposed approach depends on the data stability, i.e. it won't work if dataset is modified in the middle of the processing.

First, add a preprocessing step when the user data arrives and build simple index file 'pin' => 'record-offset' doing a linear file scan. You can use any available DBM storages. Even if your dataset is huge, resulting index file should be small enough to allow quick processing (further improvements are possible).

Second, modify your original scripts to use this index file to quickly lookup records in the original dataset based on the file offset value in the index, i.e. simple seek operation.

Hope this helps.