in reply to Re^2: Getting data from second file, based on first files contents;
in thread UPDATED - Getting data from second file, based on first files contents;
The following achieves what you want with just one pass over file1.txt and two passes over file2.txt.
#!/usr/bin/env perl use strict; use warnings; use autodie; my ($ref_file, $data_file) = qw{pm_1146340_file1.txt pm_1146340_file2. +txt}; my (%ref_left, %ref_right, @output); open my $ref_fh, '<', $ref_file; while (<$ref_fh>) { chomp; undef $ref_left{$_}; } close $ref_fh; open my $data_fh, '<', $data_file; while (<$data_fh>) { my ($left, $right) = split ' ', $_, 2; next unless exists $ref_left{$left} and not defined $ref_left{$lef +t}; ++$ref_left{$left}; ++$ref_right{$right}; } seek $data_fh, 0, 0; while (<$data_fh>) { my ($left, $right) = split ' ', $_, 2; next unless $ref_right{$right}; push @output, $_; } close $data_fh; print for @output;
Output:
123 string 1 111 string 1 222 string 1 333 string 1 456 string 2 444 string 2 555 string 2 666 string 2 789 string 3 777 string 3 888 string 3 999 string 3
If the data in file2.txt is always ordered as shown, i.e. references to file1.txt data always appear first, such as
123 string 1 111 string 1
and never as
111 string 1 123 string 1
you'll only need one pass over file2.txt.
To more fully test your code, I'd completely jumble up file2.txt and then add additional records, such as
123 string 4 111 string 4
The output should be the same with no instances of "string 4" appearing at all.
Update: I took my own advice (re "To more fully test your code, ...") and found a problem. I have fixed this by making changes to the first and second while loops. The original code is in the spoiler below.
while (<$ref_fh>) { chomp; ++$ref_left{$_}; } ... while (<$data_fh>) { my ($left, $right) = split ' ', $_, 2; next unless $ref_left{$left} or $ref_right{$right}; ++$ref_right{$right}; }
— Ken
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Getting data from second file, based on first files contents;
by james28909 (Deacon) on Oct 30, 2015 at 04:10 UTC | |
by kcott (Archbishop) on Oct 30, 2015 at 07:43 UTC |