jjohhn has asked for the wisdom of the Perl Monks concerning the following question:

A record is 13 fields, I want to collect a list of uniques in field 6, when field 10 is a certain value, i.e.
SELECT Field6 WHERE Field10 ='RIGHTRECORD'
Test data:
16 records total, I should get 2 uniques, out of 8 records that have the right value in Field10.
my %cons; my $text; my $count; while(<>){ $text = m/\|(.*?\|){5}(.*?)\|(.*?\|){3}(SNOMEDCT)/; if ($text){ print "$_\n"; } $cons{$2}++; } my $count = keys %cons; print "$count";

my result is one line, "1".

Replies are listed 'Best First'.
Re: unique fields in bar delimited record
by friedo (Prior) on May 16, 2005 at 12:23 UTC
    If you want a list of captured items returned by a regex, you need to put it in list context. But it's probably far easier to use split.

    my %cons; while(<>) { my @vals = split /\|/; if( $vals[9] eq 'RIGHTRECORD' ) { $cons{$vals[5]}++; } }
      Perfect. Exactly what I needed to know and understand.

        friedo and jhourcle are right about this being a job for split, but the problem of capturing matches from a regex is common enough that I think a brief elaboration on friedo's answer would be worthwhile.

        Even if you had written

        my ( $text ) = m/\|(.*?\|){5}(.*?)\|(.*?\|){3}(SNOMEDCT)/;
        thereby evaluating the RHS in a list context, you still would have ended up with the wrong text in $text, namely whatever was matched by the first last (.*?\|). So if you didn't change the regex you'd have to capture all the submatches in an array and figure out which slot in the array has the submatch of interest.

        I think it is simpler to use ?: to disable capture in all but those parens for which you actually want to capture something (this is a good programming habit too, because it minimizes the effect that an edit to the regexp will have on the indices of the captured matches).

        For example, in your regex only the second set of parens is capturing something of interest, threfore the capture can be disabled in all the others; actually the last set of parens (around SNOMEDCT) is required neither for grouping nor capturing, so you can eliminate it altogether:

        my ( $text ) = m/\|(?:.*?\|){5}(.*?)\|(?:.*?\|){3}SNOMEDCT/;

        the lowliest monk

Re: unique fields in bar delimited record
by jhourcle (Prior) on May 16, 2005 at 12:25 UTC
    my %items; while (my $line = <>) { my @fields = split (/\|/, $line); next unless ($fields[9] eq 'RIGHTRECORD'); $items{$fields[5]}++; } printf "%3i : %s\n", $items{$_}, $_ foreach keys %items; printf "\nTotal Unique: %i\n", scalar keys %items;
Re: unique fields in bar delimited record
by Fletch (Bishop) on May 16, 2005 at 14:10 UTC

    Not that the previous solutions aren't the perliest way to do it, but you could use DBD::AnyData and actually use your SQL (with a little tweaking).