Hi,
- use a regular expression to extract the meaningful part of the value in the given cells
- use the CSV-parsing library Text::CSV_XS to read in the first file and create a hash containing a key for each value that should not be duplicated
- use the library again to read in the second file
- use the filter option (as recently shown by choroba) to exclude rows with seen values
- use map to reduce the wanted rows to just the values
(Note I am using Inline::Files here to provide this demo script with CSV files. In real life, replace "\*FILE1" and "\*FILE2" with the real file names.)
use v5.014;
use Text::CSV_XS 'csv';
my $re = qr/ \A first (?:\s|\.) last \( (G[0-9]+) \) \Z /x;
my %hash = map { ($_->[3] =~ s/$re/$1/r) => 1 } @{
csv( in => \*FILE1, headers => 'skip' ),
};
my @result = map { $_->[0] } @{
csv( in => \*FILE2, headers => 'skip', filter => {
1 => sub { $_ =~ s/$re/$1/ and not exists $hash{$_} },
}),
};
say "found $_ " for @result;
use Inline::Files;
__FILE1__
"A","B","C","D","E"
"blah","blah","blah","first last(G123456)","blah"
"blah","blah","blah","first last(G123457)","blah"
"blah","blah","blah","first last(G123459)","blah"
__FILE2__
"A","B","C","D"
"first.last(G123456)","blah","blah","blah"
"first.last(G123457)","blah","blah","blah"
"first.last(G123458)","blah","blah","blah"
"first.last(G123459)","blah","blah","blah"
Output:
$ perl 1226797.pl
found G123458
Hope this helps!
The way forward always starts with a minimal test.
In reply to Re: 2 files
by 1nickt
in thread 2 files
by Deicide
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.