in reply to how to extract data from an array using a condition

Because this is not a CSV file but plain tab delimited and thus no quotes, I'm not getting out the big guns and solve the problem with plain Perl, just to show how simple it can be.

Step 1: pull out the header.

$_ = <IN>; chomp; my @column = split /\t/;

Step 2: read each line and convert to a hash wit hthe column names as keys:

my @data; while(<IN>) { chomp; my %row; @row{@column} = split /\t/; push @data, \%row; }

That's it: the whole file is read into @data as an array of hashes. I think you probably need more code when using Text::CSV_xs.

As for your final request, the filtering: it depends on whether you want to use the same source for something else as well, and whether the data is huge (pretty meaningless nowadays, as several MB of data is now considered "small"), you can either filter from @data using grep, or test before pushing the current row onto @data.

Assuming the condition can be written as:

$row{'sex'} eq 'F' and $row{'body mass index'} > 40 and $row{blood pre +ssure'} > 135
you can do:
push @data, \%row if $row{'sex'} eq 'F' and $row{'body mass index'} > +40 and $row{blood pressure'} > 135;
or
@filtered = grep { $_->{'sex'} eq 'F' and $_->{'body mass index'} > 4 +0 and $_->{blood pressure'} > 135 } @data;
Note that for the latter a row is a hash ref in $_, while in the former, it's a plain hash in %row.

Perl is one of the very few languages that makes a distinction between the two in syntax, and although it has its advantages (flattening lists is very easy in Perl), the different syntax in both cases is rather annoying, IMHO.

Replies are listed 'Best First'.
Re^2: how to extract data from an array using a condition
by kayj (Novice) on Jun 20, 2011 at 17:47 UTC

    Thanks for your reply, it was very helpful. I am not very familiar with array of hashes, how do you access elements from @data using the header names? I tried several ways but with no success. Thanks you all for your replies.

      You can get to grips with the basics at: but I'll quickly describe the concepts here.

      An array of hashes is a plain, one-dimensional array, where the items are references to hashes. Now in perl, in contrast with other languages like PHP and Javascript, a hash ref is not the same as a hash. A has is a data structure; a hash ref is a reference, a scalar, a single value, which points to a hash. As a result, there are rather subtle differences in syntax. If %hash is a hash, then $ref = \%hash; now is a reference to that hash, "hash ref" for short. I'll stress that it's the same hash, and not a copy. That means if you change a value in one, you'll see the same change in the other too. They're just different ways to access the same content data.

      The basic syntax is:
      hashhash ref
      reference\%hash$ref
      hash%hash%$ref
      element$hash{'key'}$ref->{'key'} or ${$ref}{$key} or $$ref{'key'}
      hash slice@hash{'one','two'}@{$ref}{'one','two'}

      So you need an array to access an item in a hash. That array is optional only between level indexes (either between square brackets or curly braces): $deep[0]->{'key'} is the same as $deep[0]{'key'}.

      The block around the reference for dereferencing (which is what we call accessing content in the data structure the reference points to) is not always necessary, but when you have a precedence problem, it's advisable to use one. (Thus: curly braces, not parentheses!)

      You can now choose to access, for example, the 'sex' of a single data row directly, as

      $data[0]{'sex'}
      or, via an explicit reference in a loop:
      foreach my $row (@data) { print $row->{'sex'}; }

      Oh, I forgot. Note that grep (and map) is actually a loop in a different syntax, where in the (loop) block you can access each item in turn via $_ (instead of $row). grep is a good way to filter in a list: if the last expression evaluated in the block is true, then the current value of $_ is pushed onto the result list that it returns. map is similar except it pushes the last values (as a list) encountered, irrespective of its values.