Morning everyone! I need some help on indexing two large text files, thank in advance!
What I am trying to do is: I have two large text files (file1 is 150M and file2 is 350M), I want to use the key in file1 to find the associated value in file2. The format of file1 and file2 look like (fields are delimited by "*"):
file1(150M):
key1*field2*field3
key2*field2*field3
...
keyn*field2*field3
file2 (350M):
key1*field2
key2*field2
...
keyn*field2
For each line of file1, use the key as the index, find the associated value in file2. If the key is found in file2, update filed2 in file1. My current solution is :
open my $if1, '<', $input_f1 or die "Can't open $input_f1: $!\n"; open my $if2, '<', $input_f2 or die "Can't open $input_f2: $!\n"; while(<$if1>) { # Read each line of file1 my $line = $_; chomp($line); my ($key1, $vf1, $vf2) = split(/\*/, $line); seek($if2, 0, 0); # Make sure file handle point to the beginning o +f the file while (<$if2>) { # Read each line of file2 my $line2 = $_; chomp($line2); my ($key2, $value) = split(/\*/, $line2); if ($key1 eq $key2) { $vf1 = $value; } else { $vf1 = ' '; } } }
Due to the size of the two files, I can not save either file1 or file2's content into hash, I have to process each file line by line. And this is taking too much time to run.
Any suggestions?
In reply to Indexing two large text files by never_more
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |