in reply to parsing metadata

This works for the small snippet of your input:
use warnings; use strict; my %atts; my $att; while (<DATA>) { chomp; $att = $1 if /^attribute:\s+(.+)/; $atts{$att} = $1 if /^value:\s+(.+)/; } use Data::Dumper; $Data::Dumper::Sortkeys=1; print Dumper(\%atts); __DATA__ AVUs defined for dataObj 3a/73/2c/metadata.csv: attribute: dcterms:created value: 2014-08-13T00:00:10 units: ---- attribute: control value: 0 units: ---- attribute: md5 value: 3a732cd0fddaa80fa65fcf28664eaf6d units: ----

Outputs:

$VAR1 = { 'control' => '0', 'dcterms:created' => '2014-08-13T00:00:10', 'md5' => '3a732cd0fddaa80fa65fcf28664eaf6d' };

Your input probably has several records, in which case you can look at perldsc.

Of course, if this is some standard format, there's probably a better solution on CPAN.

Replies are listed 'Best First'.
Re^2: parsing metadata
by AppleFritter (Vicar) on Sep 05, 2014 at 15:18 UTC

    Alternatively, you could write the loop like this:

    my @interesting = (); while(<DATA>) { chomp; push @interesting, $1 if m/^(?:attribute|value): (.*)$/; } my %attributes = @interesting; # magic

    It's a kind of magic. :) You could even turn this into a oneliner, using e.g. grep and map, and employing some more dirty tricks along the way:

    my %attributes = grep { length } map { m/^(?:attribute|value): (.*)$/ +and $1 } <DATA>;

    Anyhow, going back to the while loop version, adding support for parsing the attributes of several objects at the same time is also fairly straightforward. E.g.:

    my %dataObjs = (); my @interesting = (); my $dataObj; while(<DATA>) { chomp; if(m/^AVUs defined for dataObj (.*):$/) { defined $dataObj and $dataObjs{$dataObj} = { @interesting }; $dataObj = $1; } push @interesting, $1 if m/^(?:attribute|value): (.*)$/; } $dataObjs{$dataObj} = { @interesting };

    Though in this case I'd use your solution instead, since it leads to much simpler code:

    my %dataObjs = (); my @interesting = (); my $dataObj; my $att; while(<DATA>) { chomp; $dataObj = $1 if m/^AVUs defined for dataObj (. +*):$/; $att = $1 if m/^attribute:\s+(.+)/; $dataObjs{$dataObj}->{$att} = $1 if /^value:\s+(.+)/; }

    No magic, alas. :)

    Side note - according to my favorite WWW search engine, this sort of data is generated by iRODS. CPAN doesn't have any related modules, so here's a good opportunity to contribute to the Perl ecosystem for anyone who works with that system.

      Hi Both , Thanks !

      Yes they are iRods meta data. But they are printed on screen and I don't want to push them to a file and create more overhead. I was trying something along like this:

      foreach my $re(@results){ next if $re =~/^AVUs defined/; warn $re; my $attribute= $1 if($re =~ /^attribute:\s+(.*)/); + my $value = $1 if($re =~ /^value: (.*)/); + | warn $attribute,$value;
      And I get
      attribute: md5 EVAL_ERROR: Use of uninitialized value $value in warn at Access.pl lin +e 125. A problem occurred at /nfs/users/nfs_a/aj6/CGP/Fluidigm/perl/scripts/L +oadGenotypingResults.pl line 61.

        There's no need to push the results to a file; we're only reading from DATA here since it's convenient for self-contained example scripts. Just use backticks or the qx operator - as you already have, in fact.

        As for the error message you're getting, it doesn't seem to be related to the snippet you shared. In fact, the line

        my $value = $1 if($re =~ /^value: (.*)/); + | warn $attribute,$value;

        That said - you're declaring $attribute and $value inside the loop here, meaning that they will go out of scope and be created anew with each iteration, so at any given time, at most one of them is going to have a defined value.

        The solution is to move the declarations (my $attribute; and my $value;) out of the loop. Even then, be careful that in each iteration, they both just represent the last attribute and value seen, meaning that a) they'll be uninitialized until a attribute: and a value: line has been encountered, and b) they'll go out of sync if you have seen a new attribute: line but not its corresponding value: line yet, so be careful what you do with them at which time.

        Sorry ! My mistake didn't declare the attribute outside. Solved. Thanks