The following achieves what you want with just one pass over file1.txt and two passes over file2.txt.
#!/usr/bin/env perl use strict; use warnings; use autodie; my ($ref_file, $data_file) = qw{pm_1146340_file1.txt pm_1146340_file2. +txt}; my (%ref_left, %ref_right, @output); open my $ref_fh, '<', $ref_file; while (<$ref_fh>) { chomp; undef $ref_left{$_}; } close $ref_fh; open my $data_fh, '<', $data_file; while (<$data_fh>) { my ($left, $right) = split ' ', $_, 2; next unless exists $ref_left{$left} and not defined $ref_left{$lef +t}; ++$ref_left{$left}; ++$ref_right{$right}; } seek $data_fh, 0, 0; while (<$data_fh>) { my ($left, $right) = split ' ', $_, 2; next unless $ref_right{$right}; push @output, $_; } close $data_fh; print for @output;
Output:
123 string 1 111 string 1 222 string 1 333 string 1 456 string 2 444 string 2 555 string 2 666 string 2 789 string 3 777 string 3 888 string 3 999 string 3
If the data in file2.txt is always ordered as shown, i.e. references to file1.txt data always appear first, such as
123 string 1 111 string 1
and never as
111 string 1 123 string 1
you'll only need one pass over file2.txt.
To more fully test your code, I'd completely jumble up file2.txt and then add additional records, such as
123 string 4 111 string 4
The output should be the same with no instances of "string 4" appearing at all.
Update: I took my own advice (re "To more fully test your code, ...") and found a problem. I have fixed this by making changes to the first and second while loops. The original code is in the spoiler below.
— Ken
In reply to Re^3: Getting data from second file, based on first files contents;
by kcott
in thread UPDATED - Getting data from second file, based on first files contents;
by james28909
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |