in reply to Get all hash value into array

I suggest a different approach than replies so far. Dealing with complex multi-level hash structures is complex and is often not necessary. I suggest "flattening" your hierarchical structure to a flat table. Searching a table like that is simple, but does cost in performance because you have to examine each row for every search.

Below I show the code to transform what you have now into an Array of Hash. This is similar to the C concept of an Array of Struct. An Array of Array representation is also possible but there a few complications with that, like getting null or default values into the "unused columns".

In general, do not make a hierarchical data structure unless there is clear reason to organize the data that way. The main hash keys should be extremely important and used in almost all queries. I could not figure out what '99155' or '26134' meant although they did look like American Zip Codes to me. These numbers did not figure prominently in your example queries, which is a clue that the data structure is not quite right. It could be that an organization by state abbreviation as a key might make sense?

However, absent any new information, I would go with a flat table. This is very fast for 10K or even 100K entries. Whether that performance is acceptable or not depends upon how often you do it! At row size of 1 million, I would put this into a real DB and use SQL to access it.

Example conversion and access code follows:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $VAR1 = { '99155' => { 'PR' => [ 'state_name=Puerto Ri +co', 'county_names_all=Adj +untas|Utuado', ], 'AK' => [ 'state_name=Alaska', 'county_names_all=Ket +chikan Gateway|Prince of Wales-Hyder', ], 'WA' => [ 'state_name=Washingto +n', 'county_names_all=Pen +d Oreille|Spokane|Lincoln|Adams', 'comments=America/Los +_Angeles' ] }, '26134' => { 'WV' => [ 'state_name=West Vir +ginia', 'county_names_all=Wi +rt|Wood|Jackson|Ritchie|Calhoun', 'comments=America/Ne +w_York' ] } }; # Flatten this out to a row structure. One line per unique combination + of stuff. # Each row is represented by an anonymous hash. # An Array of Hash is similar to the C concept of an Array of Struct. # An Array of Array representation is also possible. # This db structure is easily adapatable to an SQL DB. my @rows; foreach my $zip (keys %$VAR1) { foreach my $twoLetters (keys %{$VAR1->{$zip}}) { my %fieldHash; foreach my $field ( @{$VAR1->{$zip}->{$twoLetters}} ) { my ($detail_name, $detail_value) = split (/=/,$field); $fieldHash{$detail_name} = $detail_value; + } push @rows, { zip => $zip, state => $twoLetters, %fieldHash }; } } print Dumper \@rows; =header Prints $VAR1 = [ { 'state_name' => 'Alaska', 'county_names_all' => 'Ketchikan Gateway|Prince of Wales-H +yder', 'zip' => '99155', 'state' => 'AK' }, { 'comments' => 'America/Los_Angeles', 'county_names_all' => 'Pend Oreille|Spokane|Lincoln|Adams' +, 'zip' => '99155', 'state' => 'WA', 'state_name' => 'Washington' }, { 'county_names_all' => 'Adjuntas|Utuado', 'state' => 'PR', 'zip' => '99155', 'state_name' => 'Puerto Rico' }, { 'comments' => 'America/New_York', 'county_names_all' => 'Wirt|Wood|Jackson|Ritchie|Calhoun', 'state' => 'WV', 'zip' => '26134', 'state_name' => 'West Virginia' } ]; =cut #print state names: my @state_names = map{$_->{state_name}}@rows; print join ",",@state_names,"\n"; # Alaska,Puerto Rico,Washington,We +st Virginia, #print only state names that have a comment: my @comment_state_names = map{($_->{comments}) ? $_->{state_name}: ()} +@rows; print join ",",@comment_state_names,"\n"; # West Virginia,Washington,
I would also add that in the above example, state_names were unique. If that were not true, then I would recommend: List::Util qw(uniq) to filter out duplicates.

Revision:
A more complex thing could be:

# Print each state and the counties that have a 'w' in them foreach my $row_ref (@rows) { my @counties = grep{/w/i}split /\|/,$row_ref->{county_names_all}; foreach my $county (@counties) { print "$row_ref->{state} $county\n"; } } =prints WV Wirt WV Wood AK Ketchikan Gateway AK Prince of Wales-Hyder =cut

Replies are listed 'Best First'.
Re^2: Get all hash value into array
by LanX (Saint) on Feb 02, 2020 at 13:24 UTC
    > I suggest "flattening" your hierarchical structure to a flat table.

    Your approach implies that the data is essentially analog to a DB table. But you are losing the possibility to index the data by "zip" (?) or "state" with a hash lookup.

    > Searching a table like that is simple,

    I don't see why using 3 nested while each loops are more complicated. It's pretty generic and keeps all data available. (Though you have to take care not to mess up the each iterator)

    > but does cost in performance because you have to examine each row for every search.

    Hmm, if I wanted to represent a DB table I'd use an AoA with associated index hashes.

    • One at least for the columns aka fields.
    • Then one for each "unique" field holding the row indices.
    I'm pretty sure this can already be found° on CPAN.

    Probably as object or tied array.

    I'm not very experienced with NoSQL but this might go into the same direction.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    °) a cursory look revealed Data::Table not sure if that's a good example though.

      Hi Rolf!

      Your approach implies that the data is essentially analog to a DB table.
      But you are losing the possibility to index the data by "zip" (?) or "state" with a hash lookup.

      My approach not only implies a "flat", "de-normalized" DB table, that is what it is. This will work great for a few ten's of thousands of lines. I am not "losing the possibility to index by "zip"". When you get to say 100,000K+ lines, then I would recommend a DB like SQLite. Let the DB take care of indexing. There are to be sure a lot of "if's, and's and but's" with a DB. However the OP's data structure does not appear to me to be efficient.

      From what I can tell, the use of the "zip" as a primary key doesn't make any sense. And the OP's hash structure is hard to search and inefficient. Yes, I do think that 1 loop is easier to understand than 3 loops.
      Cheers, Marshall