in reply to Merge two huge datasets by ID
The first approach could be to load the file that maps IDs and Names in a hash, then read each ID on the second file extracting the associated name
If that fails (because the "index" file is really huge and the hash doesn't fit in memory) you can try to build a database with your index file (DBD::SQLite) or maybe (it depends on your data) you can try with pack and unpack.
If your IDs are numbers and not too sparse and your names are of similar length, you can try something like:
use strict; use warnings; my ($file1, $file2) = @ARGV; my $maxL = 40; # max name length my $ids = 10000000; # last id my $bin="\0" x ($maxL * $ids) ; # Create the index open my $fh1, "<", $file1 or die $!; while (<$fh1>){ chomp; my ($id,$name) = split /\s+/; substr($bin, $id*$maxL, $maxL, pack ("A$maxL",$name)); } close $fh1; # Search $ids from file 2 open my $fh2, "<", $file2 or die $!; while (<$fh2>){ chomp; my $binval = substr ($bin,$_*$maxL,$maxL); my $valback = unpack ("A$maxL",$binval); print "$_ => $valback\n"; } close $fh2;
If this doesn't help you, provide us with a little more information about your input data (type of "IDs" and "Names", number of rows, etc...)
citromatik
|
|---|