CSV Pattern Matching

PerlNewbRP has asked for the wisdom of the Perl Monks concerning the following question:

I have two text files:


File1.txt is:

a, b, c, d, e, f1234567890qwertyuiopasdfghjkldfdf
a, b, c, c, e, fopasdfghjklfdhdkjdffgfksofbfkdndkfbfbfkf
a, b, c, d, e, fdsffdsqwertyuiopasdfghjklfdcfdfd
a, b, c, d, e, fqwertyuiopasdfghjkl90reyuebvcd

File2.txt is:

d,90
k,70
p,450

I would like a perl script to read file2.txt (column[0] and column[1])
+ and find where it matches in File1.txt(where matches are in column[3
+] and column[5]), for each line that occurs. The filenames/filepaths 
+need to be hardcoded, as I don't want user to input the files.

For example, the expected result would be-

I would like the output to be written to another file like this:

Output.txt is:

a, b, c, d, e, f1234567890qwertyuiopasdfghjkldfdf
a, b, c, d, e, fqwertyuiopasdfghjkl90reyuebvcd

Thanks in advance
[download]

Comment on CSV Pattern Matching Download Code

Replies are listed 'Best First'.
Re: CSV Pattern Matching by nemesdani (Friar) on Apr 13, 2012 at 12:59 UTC
It is considered polite to let people know if you already received an answer on StackOverflow: http://stackoverflow.com/questions/10137638/perl-need-a-script-read-two-different-columns-in-two-csv-files-and-output-line. I'm too lazy to be proud of being impatient.	[reply] [d/l]
Re: CSV Pattern Matching by Tux (Canon) on Apr 13, 2012 at 08:48 UTC
Looks like you want to use Text::CSV (or Text::CSV_XS). Maybe even in combination with Spreadsheet::Read. Enjoy, Have FUN! H.Merijn	[reply]
Re^2: CSV Pattern Matching by davido (Cardinal) on Apr 13, 2012 at 15:37 UTC
Text::CSV is the more general solution. If Text::CSV_XS is installed on the user's system, Text::CSV will make use of it automatically. In fact, Text::CSV is just a wrapper around Text::CSV_PP (pure Perl), which comes with the Text::CSV distribution, and Text::CSV_XS, which should be installed separately. Dave	[reply]
Re: CSV Pattern Matching by JavaFan (Canon) on Apr 13, 2012 at 09:59 UTC
Care to explain why the output should not be `a, b, c, d, e, f1234567890qwertyuiopasdfghjkldfdf a, b, c, d, e, fdsffdsqwertyuiopasdfghjklfdcfdfd a, b, c, d, e, fqwertyuiopasdfghjkl90reyuebvcd` [download]	[reply] [d/l]
Re^2: CSV Pattern Matching by Animator (Hermit) on Apr 13, 2012 at 11:00 UTC
Looking at the description: a, b, c, d, e, fdsffdsqwertyuiopasdfghjklfdcfdfd should not match because 'fdsffdsqwertyuiopasdfghjklfdcfdfd' does not contain '90'.	[reply]
Re: CSV Pattern Matching by traceyfreitas (Sexton) on Apr 13, 2012 at 10:40 UTC
Here you go: #!/usr/local/bin/perl use strict; use warnings; # NOTE: This program loads all of file1 and file2 into memory, # so if they're huge, this might not work. my $infile1 = "file1.txt"; my $infile2 = "file2.txt"; #-------------------------------------------------------------------- my %f2 = (); # KEY=col1, VAL=col2 (in file2) open my $INFILE2, '<', $infile2 \|\| die "Cannot open \"$infile2\"!\n"; LINE: while(my $line=<$INFILE2>) { chomp $line; next LINE if($line eq ""); # Skip empties my @input = split(/,/, $line); $f2{ $input[0] } = $input[1]; # "d" => 90 } #LINE close $INFILE2; #-------------------------------------------------------------------- my %f1 = (); open my $INFILE1, '<', $infile1 \|\| die "Cannot open \"$infile1\"!\n"; LINE: while(my $line=<$INFILE1>) { chomp $line; next LINE if($line eq ""); # Skip empties # Col3=$2, Col5=$3 if($line =~ m/(\S+, \S+, \S+, (\S+), \S+, (\S+))/) { $f1{$2}->{$3} = $1; } else { next LINE; } } #LINE close $INFILE1; #-------------------------------------------------------------------- my $outfilename = "output.txt"; open my $OUTFILE, '>', $outfilename; FILE2COL: foreach my $col1_f2 (keys %f2) { next FILE2COL unless (exists $f1{$col1_f2}); while(my($col5_f1, $line_f1) = each %{ $f1{$col1_f2} }) { my $to_match = $f2{$col1_f2}; print $OUTFILE $line_f1."\n" if( $line_f1 =~ m/$to_match/); } } close $OUTFILE; [download]	[reply] [d/l]
Re^2: CSV Pattern Matching by aaron_baugher (Curate) on Apr 13, 2012 at 12:39 UTC
There's no need to pull the first file into memory. This is a fairly standard "load the filtering file into a hash and check the other file against it line-by-line" problem, with the one extra twist that, if the first field from the filtering file is a match, another field needs to be checked against the second field. `#!/usr/bin/env perl use Modern::Perl; my %k; open my $fd2, '<', 'file2.txt' or die $!; while(<$fd2>){ chomp; if( /([a-z]),(\d+)/ ){ # one lowercase character, a comma, and di +gits $k{$1} = $2; } } close $fd2; open my $fd1, '<', 'file1.txt' or die $!; while(<$fd1>){ my @w = split /, /; if( $k{$w[3]} and $w[5] =~ /$k{$w[3]}/ ){ print; } } close $fd1;` [download] Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply] [d/l]