G'day Ppeoc,

Whenever you're working with CSV data (or similar, e.g. tab-separated), reach for Text::CSV. (If you also have Text::CSV_XS installed, Text::CSV will run faster but, note, that XS module isn't a requirement for Text::CSV.)

It looks like you're having to parse some fairly <insert expletive> data. Obviously, Name, Age, etc. would be better as headers (and not repeated with every value). I'll assume you've inherited this; however, if you created it this way, consider reformatting.

The first thing I did was to create a regex from @terms. You can see how I did this programmatically (in the code below). That will work for your 20-odd parameters but, for the three you provided, will look like this:

(Name|Age|Gender) \s+ = \s+ ( \S+ )

When matched, the parameter will be in $1 and the value in $2.

Then, using Text::CSV to parse the data, it was an easy task to check each array element for matches and print the results.

Here's my test:

#!/usr/bin/env perl -l use strict; use warnings; use Text::CSV; my @terms = qw{Name Age Gender}; my $re = '(' . join('|', @terms) . ') \s+ = \s+ ( \S+ )'; my $csv = Text::CSV->new(); while (my $row = $csv->getline(\*DATA)) { print "*** $row->[0] ***"; for (@$row) { print "$1: $2" while /$re/gx; } } __DATA__ Person1, Name = Lydia, Age = 20, Gender = F Person2, Name = Carol, Age = 54, Profession = Student, Gender = F, Hei +ght = 4'8 Person3, Name = Andy, Age = 37, Location = USA, Gender = M, Weight = 1 +17 Person4, Name = Nick, Age = 28, Gender = M

Output:

*** Person1 *** Name: Lydia Age: 20 Gender: F *** Person2 *** Name: Carol Age: 54 Gender: F *** Person3 *** Name: Andy Age: 37 Gender: M *** Person4 *** Name: Nick Age: 28 Gender: M

Note that this works with the sample data you posted. If it isn't representative of the real data, you may need to make changes (probably just to the regex) to what I have here.

— Ken


In reply to Re: Comparing an array with a regex array of strings? by kcott
in thread Comparing an array with a regex array of strings? by Ppeoc

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.