match a portion of a string from a file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! I got some help a while back but need a little more. I have two CSV files and I'm trying to see if any data matches. I started by matching a line, got that to work, then got text in a line in a CSV file (separated by commas) to match but in every case in these files it's not a 1:1 match meaning I need to match a phrase like "blue" with something like "the blue water". My data sources have hundreds of lines and several entries separated by commas in each line. Please find the code below. I tried messing around with index on the match but couldn't get anything to work. Please find the code below and thanks in advance for the assistance!

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

use Text::CSV;

my ($f1, $f2) = qw{1.csv 2.csv};
my %f1_values;
my $csv = Text::CSV::->new;

get_f1_data($f1, $csv, \%f1_values);
parse_f2_data($f2, $csv, \%f1_values);

sub get_f1_data {
    my ($file, $csv_obj, $f1_values) = @_;

    open my $fh, '<', $file;

    while (my $row = $csv_obj->getline($fh) ) {
        push @{$f1_values{$_}}, $row for @$row;
    }

    return;
}

sub parse_f2_data {
    my ($file, $csv_obj, $f1_values) = @_;

    open my $fh, '<', $file;

    while (my $row = $csv_obj->getline($fh) ) {
        my $matches = 0;

        print 'In line: ';
        $csv_obj->say(\*STDOUT, $row);

        for my $value (@$row) {
            next unless exists $f1_values->{$value};
            ++$matches;
            print "  $value found in:\n";

            for my $line (@{$f1_values->{$value}}) {
                print '    ';
                $csv_obj->say(\*STDOUT, $line);
            }
        }

        print "  No matches found\n" unless $matches;
    }

    return;
}
[download]

Comment on match a portion of a string from a file Download Code

Replies are listed 'Best First'.
Re: match a portion of a string from a file by GrandFather (Saint) on Feb 11, 2018 at 23:55 UTC
Build a hash of keys keyed by cell number from the match file then run through the search file a line at a time looking for matches: #!/bin/usr/perl use strict; use warnings; use Text::csv; my $csv_keys = <<CSV1; blue,lagoon,moon red,banana,sun CSV1 my $csv_phrases = <<CSV2; blue lagoon under a summer moon,still water,east of the sun hundreds of lines,several entries separated by commas,thanks in advanc +e CSV2 my %keys; my $csv = Text::CSV->new(); open my $keysIn, '<', \$csv_keys; while (my $row = $csv->getline($keysIn) ) { for my $cellNum (1 .. @$row) { $keys{$cellNum}{$row->[$cellNum - 1]} = $.; } } close $keysIn; open my $phrasesIn, '<', \$csv_phrases; while (my $row = $csv->getline($phrasesIn) ) { for my $cellNum (1 .. @$row) { for my $key (keys %{$keys{$cellNum}}) { next if $row->[$cellNum - 1] !~ /\Q$key\E/; print <<MATCH; Matched '$key' in cell $cellNum from keys line $keys{$cellNum}{$key} t +o phrases line $. MATCH } } } close $keysIn; [download] Prints: `Matched 'blue' in cell 1 from keys line 1 to phrases line 1 Matched 'sun' in cell 3 from keys line 2 to phrases line 1 Matched 'red' in cell 1 from keys line 2 to phrases line 2` [download] It's assumed that the entire match string needs to match and that it should only be matched in the same "cell". Premature optimization is the root of all job security	[reply] [d/l] [select]
Re: match a portion of a string from a file by choroba (Cardinal) on Feb 11, 2018 at 22:22 UTC
Can you provide samples of the input files? See Short, Self Contained, Correct Example on why we need them to help you. ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l]
Re^2: match a portion of a string from a file by Anonymous Monk on Feb 11, 2018 at 22:36 UTC
Sure here are 2 example files: file 1: `Reno dermatologist,Melanoma treatment in Reno,Basal Cell treatment in +Reno,Carcinoma treatment in Reno` [download] file 2: `dermatologist, 1945 Bluelake Lane dentist, 3 stars, office space` [download] Both files above cvs files	[reply] [d/l] [select]
Re^3: match a portion of a string from a file by choroba (Cardinal) on Feb 11, 2018 at 23:06 UTC
It seems you want to search for terms from file2 in file1. I'm not sure why you are building the hash, but if you just want to search for all the elements of file2 in all the elements of file1, you can build a large regex from the second csv and use that to search the first one: #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use autodie; use Text::CSV; sub build_regex { my ($filename) = @_; my @regexes; my $csv = 'Text::CSV'->new; open my $fh, '<', $filename; while (my $row = $csv->getline($fh)) { push @regexes, join '\|', map quotemeta, @$row; } return join '\|', @regexes } sub find_matches { my ($filename, $regex) = @_; my $csv = 'Text::CSV'->new; open my $fh, '<', $filename; while (my $row = $csv->getline($fh)) { /$regex/ and say for @$row; } } my ($f1, $f2) = qw( 1.csv 2.csv ); find_matches($f1, build_regex($f2)); [download] ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re^4: match a portion of a string from a file by AnomalousMonk (Archbishop) on Feb 12, 2018 at 00:49 UTC
Re: match a portion of a string from a file by AnomalousMonk (Archbishop) on Feb 11, 2018 at 22:43 UTC
I got some help a while back ... Can you please provide links to these interactions — see What shortcuts can I use for linking to other information? This will help us know the level of your proficiency with Perl and perhaps give some further insight into your problem. Of course, since you posted anonymously, you can't update your OP, but perhaps something in a main-thread reply... Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re^2: match a portion of a string from a file by poj (Abbot) on Feb 12, 2018 at 12:53 UTC
Code comes from this post Re: compare 2 CSV files for string return lines in both if match but now the requirement seems to be for 'fuzzier' match. poj	[reply]