Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks who are always smarter than me! I am trying to compare 2 CSV files for Names/and or Company Names that the two have in common and return the complete lines in both if there is a match. I have a script that matches lines in each file and returns lines but I want to match on a string level as opposed to a line level that I've noted above. Here's my script, any guidance on how to modify is greatly appreciated!!

use strict; use warnings; use autodie; my $f1 = shift || "/opt/test.txt"; my $f2 = shift || "/opt/test1.txt"; my %results; open my $file1, '<', $f1; while (my $line = <$file1>) { $results{$line} = 1 } open my $file2, '<', $f2; while (my $line = <$file2>) { $results{$line}++ } foreach my $line (sort { $results{$b} <=> $results{$a} } keys %results +) { print "$results{$line}:", $line if $results{$line} >1;}

Replies are listed 'Best First'.
Re: compare 2 CSV files for string return lines in both if match
by kcott (Archbishop) on Feb 03, 2018 at 10:23 UTC

    Jamming what should be blocks of code into a single line helps nobody, including yourself: it makes the code hard to read and is error-prone. Posting masses of irrelevant data, with records containing hundreds of characters, is not at all useful. Please read "How do I post a question effectively?" and "Short, Self-Contained, Correct Example".

    Whenever you're dealing with CSV data, you should use Text::CSV. If you also have Text::CSV_XS installed, it will run faster.

    It's very unclear what you're actually trying to achieve. The following is intended to provide you with some techniques that I think may be useful. You'll need to adapt this to your requirements.

    Given these dummy CSV files:

    $ cat pm_1208339_1.csv A,B,C D,E,F G,H,I A,D,G $ cat pm_1208339_2.csv B,C,D X,Y,X I,J,K

    This script:

    #!/usr/bin/env perl use strict; use warnings; use autodie; use Text::CSV; my ($f1, $f2) = qw{pm_1208339_1.csv pm_1208339_2.csv}; my %f1_values; my $csv = Text::CSV::->new; get_f1_data($f1, $csv, \%f1_values); parse_f2_data($f2, $csv, \%f1_values); sub get_f1_data { my ($file, $csv_obj, $f1_values) = @_; open my $fh, '<', $file; while (my $row = $csv_obj->getline($fh) ) { push @{$f1_values{$_}}, $row for @$row; } return; } sub parse_f2_data { my ($file, $csv_obj, $f1_values) = @_; open my $fh, '<', $file; while (my $row = $csv_obj->getline($fh) ) { my $matches = 0; print 'In line: '; $csv_obj->say(\*STDOUT, $row); for my $value (@$row) { next unless exists $f1_values->{$value}; ++$matches; print " $value found in:\n"; for my $line (@{$f1_values->{$value}}) { print ' '; $csv_obj->say(\*STDOUT, $line); } } print " No matches found\n" unless $matches; } return; }

    Produces this output:

    In line: B,C,D B found in: A,B,C C found in: A,B,C D found in: D,E,F A,D,G In line: X,Y,X No matches found In line: I,J,K I found in: G,H,I

    That seems to be the type of thing you're after but, as I said, I'm really not sure. Hopefully there's something that you'll find useful. If you have any further questions, please follow the guidelines I linked to at the start.

    — Ken

Re: compare 2 CSV files for string return lines in both if match
by Laurent_R (Canon) on Feb 02, 2018 at 20:19 UTC
    Please can you show what your input lines look like?

      Here are two example cvs files. The hard part is match data could be in any column.

      csv1:

      3.0.425146689842197.html,https://www.yelp.com/c/seattle/oncologist,"ht +tps://www.yelp.com/c/seattle/oncologist, Yelp,recommendation,San Fran +cisco, bay area, local,business,review,friend,restaurant,dentist,doct +or,salon,spa,shopping,store,share,community,massage,sushi,pizza,nails +,New York,Los Angeles",,"The Best 10 Oncologist in Seattle, WA - Last + Updated January 2018 - Yelp",,"Best Oncologist in Seattle, WA - Dr. +Toy Story, Seattle Integrative Oncology, Cancer Treatment Navigator, +Sherry Hu, MD, PhD, Wong Matthew L, MD, Pacific Northwest Integrative + Medicine, Michael A Hunter, MD, Rapha Integrative Family Clinic,_&#1 +32;_",,Dr. Hunter is an amazingly empathetic and incredibly,, Hunter +is an amazingly empathetic and incredibly,Dr. Chang is knowledgeable +&amp; gentle. She's a mother,, Chang is knowledgeable &amp; gentle. S +he's a mother 3.0.525511480497915.html,https://www.yelp.com/c/portland/health,"https +://www.yelp.com/c/portland/health, Yelp,recommendation,San Francisco, + bay area, local,business,review,friend,restaurant,dentist,doctor,sal +on,spa,shopping,store,share,community,massage,sushi,pizza,nails,New Y +ork,Los Angeles",,Health & Medical in Portland - Yelp,,"The Best Heal +th & Medical in Portland on Yelp. Read about places like: Precision H +ealing, Mudra Massage, Therapydia Portland, Skin by Lovely Portland, +Farma, Eyes On Broadway, Laurelwood Dental, Myoptic Optometry + Moder +n Eyewear...",,Dr. Barreto</span> was a,, Barreto</span> was a,Dr.Phi +llips for a couple years now and she is simply,,Phillips for a couple + years now and she is simply 3.0.744123631398576.html,https://www.providence.org/doctors/profile.as +px?name=miklos++simon&id=157134,"https://www.providence.org/doctors/p +rofile.aspx?name=miklos++simon&id=157134, HematologyMedical Oncology, + Miklos Simon, MD, Portland,OR",,"Miklos Simon, MD | Portland,OR, ",, +"Miklos Simon, MD is a specialist in HematologyMedical Oncology who h +as an office at 5050 Northeast Hoyt Street in Portland, OR and can be + reached at 503-239-7767.",,Dr. Simon's practice is focused in the fi +eld of,, Simon's practice is focused in the field of,"Dr. Simon was b +orn in Budapest, Hungary.",," Simon was born in Budapest, Hungary."

      csv2

      Dr. Toy Story,"Clinical Data Manager, Statistical Center - Fred Hutch +","Cancer Care Alliance,; Centre for Addiction and Mental Health Tran +slational Addiction Research Laboratory,; SKMTranscription and ProScr +ipt Medical ... Obtaining, abstracting, coding and recording complex +data into databases, study-specific electronic and paper-based data- +capture systems..." Leon Smith MD,"Executive Medical Director, Clinical Lead, Research and + Development - Seattle Genetics"," medical oncologist, joins Genetics + with a vast experience in Medical Affairs, R&D and in the Tech Indus +try. In his role, Global Medical ... Project DataSphere is a universa +l platform to responsibility share datasets to revolutionalize cancer + research. It is designed to..." Dr. Donna Lapmaker,not provided," medical oncologist, joins Genetics w +ith a vast experience in Medical Affairs, R&D and in the Tech Industr +y. In his role, Global Medical ... Project DataSphere is a universal +platform to responsibility share datasets to revolutionalize cancer r +esearch. It is designed to..."
        It looks quite messy. How are you supposed to identify the names and company names that you want to compare?