in reply to compare 2 CSV files for string return lines in both if match

Please can you show what your input lines look like?
  • Comment on Re: compare 2 CSV files for string return lines in both if match

Replies are listed 'Best First'.
Re^2: compare 2 CSV files for string return lines in both if match
by Anonymous Monk on Feb 02, 2018 at 20:33 UTC

    Here are two example cvs files. The hard part is match data could be in any column.

    csv1:

    3.0.425146689842197.html,https://www.yelp.com/c/seattle/oncologist,"ht +tps://www.yelp.com/c/seattle/oncologist, Yelp,recommendation,San Fran +cisco, bay area, local,business,review,friend,restaurant,dentist,doct +or,salon,spa,shopping,store,share,community,massage,sushi,pizza,nails +,New York,Los Angeles",,"The Best 10 Oncologist in Seattle, WA - Last + Updated January 2018 - Yelp",,"Best Oncologist in Seattle, WA - Dr. +Toy Story, Seattle Integrative Oncology, Cancer Treatment Navigator, +Sherry Hu, MD, PhD, Wong Matthew L, MD, Pacific Northwest Integrative + Medicine, Michael A Hunter, MD, Rapha Integrative Family Clinic,_&#1 +32;_",,Dr. Hunter is an amazingly empathetic and incredibly,, Hunter +is an amazingly empathetic and incredibly,Dr. Chang is knowledgeable +&amp; gentle. She's a mother,, Chang is knowledgeable &amp; gentle. S +he's a mother 3.0.525511480497915.html,https://www.yelp.com/c/portland/health,"https +://www.yelp.com/c/portland/health, Yelp,recommendation,San Francisco, + bay area, local,business,review,friend,restaurant,dentist,doctor,sal +on,spa,shopping,store,share,community,massage,sushi,pizza,nails,New Y +ork,Los Angeles",,Health & Medical in Portland - Yelp,,"The Best Heal +th & Medical in Portland on Yelp. Read about places like: Precision H +ealing, Mudra Massage, Therapydia Portland, Skin by Lovely Portland, +Farma, Eyes On Broadway, Laurelwood Dental, Myoptic Optometry + Moder +n Eyewear...",,Dr. Barreto</span> was a,, Barreto</span> was a,Dr.Phi +llips for a couple years now and she is simply,,Phillips for a couple + years now and she is simply 3.0.744123631398576.html,https://www.providence.org/doctors/profile.as +px?name=miklos++simon&id=157134,"https://www.providence.org/doctors/p +rofile.aspx?name=miklos++simon&id=157134, HematologyMedical Oncology, + Miklos Simon, MD, Portland,OR",,"Miklos Simon, MD | Portland,OR, ",, +"Miklos Simon, MD is a specialist in HematologyMedical Oncology who h +as an office at 5050 Northeast Hoyt Street in Portland, OR and can be + reached at 503-239-7767.",,Dr. Simon's practice is focused in the fi +eld of,, Simon's practice is focused in the field of,"Dr. Simon was b +orn in Budapest, Hungary.",," Simon was born in Budapest, Hungary."

    csv2

    Dr. Toy Story,"Clinical Data Manager, Statistical Center - Fred Hutch +","Cancer Care Alliance,; Centre for Addiction and Mental Health Tran +slational Addiction Research Laboratory,; SKMTranscription and ProScr +ipt Medical ... Obtaining, abstracting, coding and recording complex +data into databases, study-specific electronic and paper-based data- +capture systems..." Leon Smith MD,"Executive Medical Director, Clinical Lead, Research and + Development - Seattle Genetics"," medical oncologist, joins Genetics + with a vast experience in Medical Affairs, R&D and in the Tech Indus +try. In his role, Global Medical ... Project DataSphere is a universa +l platform to responsibility share datasets to revolutionalize cancer + research. It is designed to..." Dr. Donna Lapmaker,not provided," medical oncologist, joins Genetics w +ith a vast experience in Medical Affairs, R&D and in the Tech Industr +y. In his role, Global Medical ... Project DataSphere is a universal +platform to responsibility share datasets to revolutionalize cancer r +esearch. It is designed to..."
      It looks quite messy. How are you supposed to identify the names and company names that you want to compare?

        In this example I would be hoping to match "Dr. Toy Story" in csv1 to "Dr. Toy Story" in csv2. This example doesn't contain an institute match. In need to compare each string between each set of comas to each string, again in between commas as it's csv in the second file. At match I'd like to return the line containing the match for each and write it to a new file.