anadem has asked for the wisdom of the Perl Monks concerning the following question:

Please advise me on how to structure my data. It's an embarrassingly basic Perl question but my hash of hashes isn't enough to hold the data I need to handle, and I don't know how to add the missing piece. The data structure needs to look like this
component => { version => "1.23.4.567", sources => { "filename1" => "checksumfile1", "filename2" => "checksumfile2", "filename3" => "checksumfile3", }
The data comes from a plain text file supplied in this format (no choice), one data item per line. (Filenames and checksum filenames are long URLS, so can't be conveniently placed on the same line but the source file URL line is immediately followed by the line with the URL for its checksum file):
component=HF version=NULL sourcefile=filename1 sourcesum=checksumfile1 sourcefile=filename2 sourcesum=checksumfile2 sourcefile=filename3 sourcesum=checksumfile3 component=SVM version=10.0.70.102 sourcefile=filename4 sourcesum=checksumfile4
My attempted code is like this:
use strict; use Data::Dump qw(dump); my %HoH = (); my $rec; my $component; while ( <DATA> ) { chomp; next unless m/^\S+/; if( m/^component=(.*)/ ) { $component = $1; $rec = {}; $HoH{$component} = $rec; } elsif( m/^version=(.*)/ ) { $rec->{"version"} = $1; } else { my ($key, $value) = split /=/; $rec->{$key} = $value; } } for my $component ( keys %HoH ) { print "$component: "; for my $key ( keys %{ $HoH{$component} } ) { print "$key=$HoH{$component}{$key} "; } print "\n"; } dump %HoH; __DATA__ component=HF version=NULL sourcefile=filename1 sourcesum=checksum1 sourcefile=filename2 sourcesum=checksum2 sourcefile=filename3 sourcesum=checksum3 component=SVM version=10.0.70.102 sourcefile=filename4 sourcesum=checksum4
Unfortunately (of course) the data for sources 1 and 2 is overwritten by source 3, so I think I need an array of source hashes, but I can't figure out how to code it thus this plea for help, kind monks.

Replies are listed 'Best First'.
Re: how to include an array of hashes in a hash of hashes?
by tangent (Parson) on Sep 16, 2013 at 23:09 UTC
    I have had to make quite a few changes to your logic in order to get the desired result, but this works for the given input, and assumes that "the source file URL line is immediately followed by the line with the URL for its checksum file" is always the case.
    use strict; use Data::Dumper; my %HoH = (); my $component; while ( my $line = <DATA> ) { chomp $line; next unless $line =~ m/^\S+/; if ( $line =~ m/^component=(.*)/ ) { $component = $1; next; } next unless $component; if ( $line =~ m/^version=(.*)/ ) { $HoH{$component}{"version"} = $1; next; } if ( $line =~ m/^sourcefile/ ) { my ($k, $key) = split(/=/,$line); # READ THE NEXT LINE $line = <DATA>; chomp $line; if ( $line =~ m/^sourcesum/ ) { my ($v, $value) = split(/=/,$line); $HoH{$component}{'sources'}{$key} = $value; } } } for my $component ( keys %HoH ) { my $hash = $HoH{$component}; my $version = $hash->{'version'} || ''; my $sources = $hash->{'sources'} || {}; print "$component: \n"; print "version = $version\n"; for my $key ( keys %{ $sources } ) { print "$key = $sources->{$key}\n"; } print "\n"; } print Dumper(\%HoH);

    Output:
    HF: version = NULL filename1 = checksum1 filename3 = checksum3 filename2 = checksum2 SVM: version = 10.0.70.102 filename4 = checksum4 'HF' => { 'version' => 'NULL' 'sources' => { 'filename1' => 'checksum1', 'filename3' => 'checksum3', 'filename2' => 'checksum2' }, }, 'SVM' => { 'version' => '10.0.70.102' 'sources' => { 'filename4' => 'checksum4' }, }
    Note: this method does not keep the order of the sources. If you need to keep the order then save them as an array of arrays like so:
    REPLACE $HoH{$component}{'sources'}{$key} = $value; WITH push( @{ $HoH{$component}{'sources'} }, [$key,$value] ); THEN ACCESS for my $source ( @{ $sources } ) { print "$source->[0] = $source->[1]\n"; }
      thanks! I like the extra read, to validate correct order in the config spec. I'll use that along with Ken's data structuring.
Re: how to include an array of hashes in a hash of hashes?
by kcott (Archbishop) on Sep 17, 2013 at 03:19 UTC

    G'day anadem,

    "Please advise me on how to structure my data."

    I don't think there's anything wrong with your current structure, except that you need one per component=XXX with each being held in another hash whose keys are the XXX values.

    Here's how I produced that:

    $ perl -Mstrict -Mwarnings -e ' use Data::Dumper; my @data = qw{ component=HF version=NULL sourcefile=filename1 sourcesum=checksumfile1 sourcefile=filename2 sourcesum=checksumfile2 sourcefile=filename3 sourcesum=checksumfile3 component=SVM version=10.0.70.102 sourcefile=filename4 sourcesum=checksumfile4 }; my (%compdata, $comp, $file); for (@data) { my ($key, $value) = split /=/; for ($key) { /component/ && do { $comp = $value; last }; /version/ && do { $compdata{$comp}{version} = $value; l +ast }; /sourcefile/ && do { $file = $value; last }; /sourcesum/ && do { $compdata{$comp}{sources}{$file} = $v +alue }; } } print Dumper \%compdata; ' $VAR1 = { 'SVM' => { 'version' => '10.0.70.102', 'sources' => { 'filename4' => 'checksumfile4' } }, 'HF' => { 'version' => 'NULL', 'sources' => { 'filename3' => 'checksumfile3', 'filename2' => 'checksumfile2', 'filename1' => 'checksumfile1' } } };

    -- Ken

      Thanks so much for a very elegant solution! Much appreciated. (Downside is that it makes my perl code look like COBOL ;-)
      Please help with more syntax! I turns out that I need an array as well, to retain the order of the source files. Something like the "<<WRONG" line here:
      my (%compdata, $comp, $file, @srcfiles); for (@data) { my ($key, $value) = split /=/; for ($key) { /component/ && do { $comp = $value; last }; /version/ && do { $compdata{$comp}{version} = $value; last }; /sourcefile/ && do { $file = $value; push (@($compdata{$comp}{$srcfiles}), $value); # <<WRONG last }; /sourcesum/ && do { $compdata{$comp}{sources}{$file} = $value }; } }
      For example, when the data is
      component=HF version=NULL sourcefile=file10 sourcesum=checksum10 sourcefile=file2 sourcesum=checksum2 sourcefile=file3 sourcesum=checksum3
      When using only the hash, the output is
      'sources' => { 'file3' => 'checksum3', 'file2' => 'checksum2', 'file10' => 'checksum10'
      but I need to be able to get the same order as the provided data (by using foreach @srcfiles { get hashvalue } (which is also syntax I don't know))
      'sources' => { 'file10' => 'checksum10' 'file3' => 'checksum3', 'file2' => 'checksum2',

        Flagging something as "WRONG", without providing a reason, is not particularly useful. "push @{ ARRAYREF }, LIST" was probably the syntax you wanted; however, that says nothing about whether the logic or functionality is "RIGHT" or "WRONG", nor what those terms mean in either context.

        Hashes have no inherent ordering. Stating "When using only the hash, the output is ..." is meaningless. If you need to keep track of the order of hash keys, you have to do it yourself and an array is usually the best tool for this task. See keys for a discussion of this; there's a lot more detail in the Hash Algorithm section of "perlsec: Algorithmic Complexity Attacks".

        Before writing any more code, I'd recommend that you look at "perlreftut - Mark's very short tutorial about references", "perldsc - Perl Data Structures Cookbook" and "perllol - Manipulating Arrays of Arrays in Perl".

        Next consider what data you need to collect and how you're going to use it. That should provide insight into how to store and subsequently access the data. Storing the data without sufficient forethought, and then finding you have to go back to square one and start all over again, is a waste of time, effort and money (not to mention just plain, old boring).

        Here's a possible approach you could take:

        #!/usr/bin/env perl use strict; use warnings; use autodie; use Data::Dumper; my @out; open my $in_fh, '<', 'pm_1054339_data.txt'; while (<$in_fh>) { chomp; my ($key, $value) = split /=/; for ($key) { /component/ && do { push @out, { $key => $value }; last }; /version/ && do { $out[-1]{$key} = $value; last }; /sourcefile/ && do { push @{$out[-1]{order}}, $value; last }; /sourcesum/ && do { $out[-1]{sources}{$out[-1]{order}[-1]} = +$value }; } } close $in_fh; print Dumper \@out;

        With this input:

        $ cat pm_1054339_data.txt component=HF version=NULL sourcefile=filename1 sourcesum=checksumfile1 sourcefile=filename2 sourcesum=checksumfile2 sourcefile=filename3 sourcesum=checksumfile3 component=SVM version=10.0.70.102 sourcefile=filename4 sourcesum=checksumfile4 component=HF_2 version=NULL sourcefile=file10 sourcesum=checksum10 sourcefile=file2 sourcesum=checksum2 sourcefile=file3 sourcesum=checksum3

        You get this output:

        $ pm_1054339_data.pl $VAR1 = [ { 'order' => [ 'filename1', 'filename2', 'filename3' ], 'sources' => { 'filename3' => 'checksumfile3', 'filename1' => 'checksumfile1', 'filename2' => 'checksumfile2' }, 'version' => 'NULL', 'component' => 'HF' }, { 'version' => '10.0.70.102', 'order' => [ 'filename4' ], 'sources' => { 'filename4' => 'checksumfile4' }, 'component' => 'SVM' }, { 'version' => 'NULL', 'sources' => { 'file10' => 'checksum10', 'file3' => 'checksum3', 'file2' => 'checksum2' }, 'order' => [ 'file10', 'file2', 'file3' ], 'component' => 'HF_2' } ];

        -- Ken