Wilderness has asked for the wisdom of the Perl Monks concerning the following question:

I have many users writing perl data structures to multiple files simultaneously. In file.cfg
{ alpha => { beta => { gamma => theta, delta => lambda, }, beta => { gamma => zeta, }, }, },
If I just `do` this file, it will take only one value for alpha->beta since a Perl hash can have only one unique key. But I want to be able to parse this data structure and indicate that there are duplicate keys in the file. These structures might span multiple files. Is there a way to eval a block at a time from the file - in this example, eval just the first level alpha->beta->gamma->theta - store it in a local hash, and then eventually eval alpha->beta->gamma->zeta and flag it? I know I could have arrays instead and then iterate over them to find copies but I want to keep the intuitive structure intact but still be able to flag any duplicates. Any other thoughts or suggestions of creating files differently are welcome.

Replies are listed 'Best First'.
Re: Storing/parsing perl data structure in/from a file (YAML)
by LanX (Saint) on Jun 13, 2013 at 01:01 UTC
    this prevents information loss but results into array of arrays

    use strict; use warnings; use Data::Dump qw/pp/; my $data_str= do { local $/;<DATA>}; $data_str=~ tr/{}/[]/; print $data_str; my $h; eval "\$h =$data_str"; pp $h; __DATA__ { alpha => { beta => { gamma => "theta", delta => "lambda", }, beta => { gamma => "zeta", }, }, }

    Output:

    [ alpha => [ beta => [ gamma => "theta", delta => "lambda", ], beta => [ gamma => "zeta", ], ], ] [ "alpha", [ "beta", ["gamma", "theta", "delta", "lambda"], "beta", ["gamma", "zeta"], ], ]

    now you can parse the data and tranform it to the structure you want. Note that I needed to quote the barewords in your original example!

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    UPDATE

    If you just need a compact format, why don't you use YAML?

    these are equivalent:

    #YAML --- - alpha: - beta: - gamma: - theta - delta: - lambda - beta: - gamma: - zeta # PERL AoH³ [ { alpha => [ { beta => [{ gamma => ["theta"] }, { delta => ["lambda"] }] +}, { beta => [{ gamma => ["zeta"] }] }, ], }, ]
Re: Storing/parsing perl data structure in/from a file
by hdb (Monsignor) on Jun 12, 2013 at 21:27 UTC

    Have a look at this Re^2: splitting data. If you replace parantheses by braces, you get a parser for nested structures. If you add code that stores data in hashes and check for the prior existence of the key, this could be working.

    UPDATE: I spent some time following my own advice but could not make +it work. Apologies, if you wasted time trying it. Please post the r +esult should you have succeeded.
Re: Storing/parsing perl data structure in/from a file
by frozenwithjoy (Priest) on Jun 12, 2013 at 21:46 UTC

    If I'm understanding what you are going for, I'd use a hash of arrays of hashes. That way you keep all of the data and can easily find 'duplicated keys':

    #!/usr/bin/env perl use strict; use warnings; use feature 'say'; use Data::Printer; my %hash = ( alpha1 => [ { beta => { gamma => 'theta', delta => 'lambda', } }, { beta => { gamma => 'zeta', } }, ], alpha2 => [ { beta => { gamma => 'theta', delta => 'lambda', } }, ], ); say "## Original state:"; check_duplication(); push @{ $hash{alpha2} }, { beta => { gamma => 'theta', } }; say "## After pushing another element onto 'alpha2':"; check_duplication(); sub check_duplication { for ( keys %hash ) { if ( scalar @{ $hash{$_} } > 1 ) { say "$_ has a duplicate."; } else { say "$_ is unique."; } } } p %hash;

    OUTPUT:

    { alpha1 [ [0] { beta { delta "lambda", gamma "theta" } }, [1] { beta { gamma "zeta" } } ], alpha2 [ [0] { beta { delta "lambda", gamma "theta" } }, [1] { beta { gamma "theta" } } ] } ## Original state: alpha2 is unique. alpha1 has a duplicate. ## After pushing another element onto 'alpha2': alpha2 has a duplicate. alpha1 has a duplicate.
      Thanks for you answer ! My data structure is actually 6-7 levels deep and I got it to work with arrays of hashes of arrays of ... hashes. I want to know if it's possible to `do` a file partially, or open a file and manually parse each line and then `do` whatever part I want to. Anyways, I will work with arrays for now.
        Plz note: the data you posted is not a valid serialization of a HoH!

        The only way I see is to assign the data to a tied hash, which automatically reacts on collisions...of course all nested hashes need to be tied too.

        something like

        my $datastring =slurp $file; tie %h, "CollisionHash"; eval "\%h = $datastring"
        Not sure of this works, and I don't wanna trie to implement it for such a "unique" requirement...

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        You're welcome. What exactly do you mean when you say "`do` a file"?
Re: Storing/parsing perl data structure in/from a file
by hdb (Monsignor) on Jun 13, 2013 at 09:46 UTC

    Based on LanX' advice Re: Storing/parsing perl data structure in/from a file (YAML), it turns your data into an array of arrays, adding quotes in a simplistic way, and then traverses the resulting structure to turn it into a hash of hashes. It adds underscores to the keys to avoid duplicates. You might want to change that code to add a warning message.

    use strict; use warnings; use Data::Dumper; my $data; { local $/; $data = <DATA> } $data =~ tr/}{/][/; $data =~ s/\s*(.*?)\s*=>\s*(.*?)\s*,/"$1","$2",/g; # warning: simplist +ic quoting my $aref; eval "\$aref=$data"; sub array2hash { my $aref = shift; return $aref if "ARRAY" ne ref $aref; die "Cannot turn array with odd number of elements into hash.\ +n" if @$aref %2; my $href = {}; for (0..@$aref/2-1) { $aref->[2*$_] .= "_" while( exists $href->{$aref->[2*$ +_]} ); $href->{$aref->[2*$_]} = array2hash( $aref->[2*$_+1] ) +; } return $href; } my $href = array2hash $aref; print Dumper $href; __DATA__ { alpha => { beta => { gamma => theta, delta => lambda, }, beta => { gamma => zeta, }, }, },

    If you want the contents of several files into one hash, I would think the best strategy is to first read the contents of all files into one string, then turn it into array of arrays and then into hash of hashes.