kayj has asked for the wisdom of the Perl Monks concerning the following question:

I have data with patients information ( a tab delimited file with a header) that I read into an array . what I wan to do is extract from the array patients with specific characteristics, like a person is a female with body mass index > 40 and blood pressure >135. Instead using an if statement to do that, is there another way to extract this information into an array? Also, to do this, I have to know the location of these variables in my file ( what is the column number for sex, body mass index and blood pressure), is there a way to use the header of the file to point to the variable of interest instead of counting the columns?

Your help is greatly appreciated
  • Comment on how to extract data from an array using a condition

Replies are listed 'Best First'.
Re: how to extract data from an array using a condition
by wfsp (Abbot) on Jun 18, 2011 at 16:47 UTC
    It may be worth havine a look at Text::CSV (which can work with tabs too). The column_names method shows how you might get the names from the header (i.e assuming it is the first line). Subsequent get_line_hr calls will then return a hash ref keyed on column names.

    It would be fairly straight forward to identify which records you need.

    If you need help with that post a shortish/simplified example of your data.

    Good luck!

Re: how to extract data from an array using a condition
by CountZero (Bishop) on Jun 18, 2011 at 16:54 UTC
    Or use DBD::CSV, then you vancan use standard SQL to extract the data from your file.

    Update: Fixed a typo. Thanks davido

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: how to extract data from an array using a condition
by kcott (Archbishop) on Jun 18, 2011 at 16:38 UTC

    The grep() function extracts data from an array using a condition.

    -- Ken

Re: how to extract data from an array using a condition
by xyzzy (Pilgrim) on Jun 18, 2011 at 16:49 UTC

    if you want to use the names of the fields instead of the position, why not use an array of hash-references for each row? simply modify your code to split the header row into an array of keys, then read each row or data and map the values to the keys. if you do everything correctly, you would be able to extract what you want using something like

    my @data = grep {$_->{bmi} > 40 && $_->bpi > 135 && $_->{sex} eq 'F'} @patients

    $,=qq.\n.;print q.\/\/____\/.,q./\ \ / / \\.,q.    /_/__.,q..
    Happy, sober, smart: pick two.
Re: how to extract data from an array using a condition
by bart (Canon) on Jun 18, 2011 at 21:23 UTC
    Because this is not a CSV file but plain tab delimited and thus no quotes, I'm not getting out the big guns and solve the problem with plain Perl, just to show how simple it can be.

    Step 1: pull out the header.

    $_ = <IN>; chomp; my @column = split /\t/;

    Step 2: read each line and convert to a hash wit hthe column names as keys:

    my @data; while(<IN>) { chomp; my %row; @row{@column} = split /\t/; push @data, \%row; }

    That's it: the whole file is read into @data as an array of hashes. I think you probably need more code when using Text::CSV_xs.

    As for your final request, the filtering: it depends on whether you want to use the same source for something else as well, and whether the data is huge (pretty meaningless nowadays, as several MB of data is now considered "small"), you can either filter from @data using grep, or test before pushing the current row onto @data.

    Assuming the condition can be written as:

    $row{'sex'} eq 'F' and $row{'body mass index'} > 40 and $row{blood pre +ssure'} > 135
    you can do:
    push @data, \%row if $row{'sex'} eq 'F' and $row{'body mass index'} > +40 and $row{blood pressure'} > 135;
    or
    @filtered = grep { $_->{'sex'} eq 'F' and $_->{'body mass index'} > 4 +0 and $_->{blood pressure'} > 135 } @data;
    Note that for the latter a row is a hash ref in $_, while in the former, it's a plain hash in %row.

    Perl is one of the very few languages that makes a distinction between the two in syntax, and although it has its advantages (flattening lists is very easy in Perl), the different syntax in both cases is rather annoying, IMHO.

      Thanks for your reply, it was very helpful. I am not very familiar with array of hashes, how do you access elements from @data using the header names? I tried several ways but with no success. Thanks you all for your replies.

        You can get to grips with the basics at: but I'll quickly describe the concepts here.

        An array of hashes is a plain, one-dimensional array, where the items are references to hashes. Now in perl, in contrast with other languages like PHP and Javascript, a hash ref is not the same as a hash. A has is a data structure; a hash ref is a reference, a scalar, a single value, which points to a hash. As a result, there are rather subtle differences in syntax. If %hash is a hash, then $ref = \%hash; now is a reference to that hash, "hash ref" for short. I'll stress that it's the same hash, and not a copy. That means if you change a value in one, you'll see the same change in the other too. They're just different ways to access the same content data.

        The basic syntax is:
        hashhash ref
        reference\%hash$ref
        hash%hash%$ref
        element$hash{'key'}$ref->{'key'} or ${$ref}{$key} or $$ref{'key'}
        hash slice@hash{'one','two'}@{$ref}{'one','two'}

        So you need an array to access an item in a hash. That array is optional only between level indexes (either between square brackets or curly braces): $deep[0]->{'key'} is the same as $deep[0]{'key'}.

        The block around the reference for dereferencing (which is what we call accessing content in the data structure the reference points to) is not always necessary, but when you have a precedence problem, it's advisable to use one. (Thus: curly braces, not parentheses!)

        You can now choose to access, for example, the 'sex' of a single data row directly, as

        $data[0]{'sex'}
        or, via an explicit reference in a loop:
        foreach my $row (@data) { print $row->{'sex'}; }

        Oh, I forgot. Note that grep (and map) is actually a loop in a different syntax, where in the (loop) block you can access each item in turn via $_ (instead of $row). grep is a good way to filter in a list: if the last expression evaluated in the block is true, then the current value of $_ is pushed onto the result list that it returns. map is similar except it pushes the last values (as a list) encountered, irrespective of its values.

Re: how to extract data from an array using a condition
by Marshall (Canon) on Jun 19, 2011 at 01:06 UTC
    The DBI code is also fairly simple...You have to set the field separator to be tab instead of the comma default. The easiest way is to treat each file as a separate table, below the file "patients" is the table with the data. There is a fair amount to learn about SQL and the DBI if these are completely new topics. This approach is the most extensible but it is also the most work.
    #!/usr/bin/perl -w use strict; use DBI; # csv_sep_char=\t means tab separated, default is a comma of course my $dbh = DBI->connect("DBI:CSV:csv_sep_char=\t;RaiseError=1") or die "Cannot connect: " . $DBI::errstr; my $sth = $dbh->prepare("SELECT * FROM patients WHERE bmi>24 AND bp>135 AND age<30"); $sth->execute(); while (my $row = $sth->fetchrow_arrayref()) { print "@$row\n"; } __END__ Prints: Jane 50 23 200 Joe 30 22 140 File:patients contains tab separation in real file, my editor converts them to spaces here: name bmi age bp Jane 50 23 200 Bob 25 55 120 Norm 28 30 136 Joe 30 22 140 Ben 24 85 110
    Now of course, if this is a "one off" thing, Excel is capable of importing this tab delimited file and the query tool would get you a result set too.