rwx--- has asked for the wisdom of the Perl Monks concerning the following question:

I'm stuck on visualizing data structure for my table. I'm writing a Perl program that will compare aspects of different cities. I have a pdf file for each city, which contains couple of data tables. I have to scan the data, store it in some data structure, so that later user will have an option of what aspect of the cities to compare. Here's an example table to better visualize this:

CITY A Population by Age and Gender Age Group Male Female Total 0-9 220 180 400 10-19 175 142 317 20-29 260 265 525 Family Households Family Type Households % of Total Married 145 10 Divorced 60 7 Single 162 15

My code goes through the file, line by line, and recognizes what part of the table it is (Header, data, etc.). I have a problem with how I should be storing all this data. There are couple of important points. First, each table can have different number of rows and columns, but each table will have at least two columns and rows. Second, later user will specify a table (ex. Population by Age and Gender) and row and column (ex Age Group 10-19 and Male) so that this extracted value will be compared to all other cities. I'm fairly new to complex data structures in Perl. I'm not sure if I should use Hash of Hashes, Hash of Arrays, or Hash of Arrays of Hashes. I've been stuck on this part for some time, and can't visualize the proper data structure.

I just need a visualization of a data structure that I should use, no code necessary. The most important part is that later, it will be easy to display available tables, and rows and columns for the user to pick.

I come up with the data structure below, but it will not be easy to later show the user available rows and columns. Thanks.

{ "Population by Age and Gender" => { Age Group => ["0-9", "10-19", "20-29"], Male => [220, 175, 260], + Female => [180, 142, 265] } "Family Households" => {"Family Type" => [Married, Divorced, Single], Households => [145, + 60, 162] } }

Replies are listed 'Best First'.
Re: Building data structure from multi-row/column table
by Athanasius (Archbishop) on Apr 03, 2015 at 03:39 UTC

    Hello rwx---, and welcome to the Monastery!

    You could reorganise the HoHoA into a HoHoH:

    { "Population by Age and Gender" => { "0-9" => { Male => 220, Female => 180 }, "10-19" => { Male => 175, Female => 142 }, "20-29" => { Male => 260, Female => 265 }, }, "Family Households" => { Married => { Households => 145 }, Divorced => { Households => 60 }, Single => { Households => 162 }, }, }

    but this is still going to quickly become unwieldly. I strongly suggest you consider using a database instead. For example, the module DBD::SQLite can be used to create a database which occupies a separate file but requires no software outside of your Perl script.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Building data structure from multi-row/column table
by LanX (Saint) on Apr 03, 2015 at 03:55 UTC
    I think you have to separate heads and data of a table.

    your current structure is problematic, because HoHs are unsorted and you can't distinguish what the row/column headers are

    { columns => { Male => 0, Female => 1 }, rows => { "0-9" => 0, "10-19" => 1, "20-29" => 2 }, data => [ [220, 175, 260], [180, 142, 265] ] }
    (untested)

    You need to look-up indices to get the data now.

    This could be encapsulated in a tied hash or an object class.¹

    "Age Group" makes it a bit more complicated, not sure what it is, how would you call it?

    Maybe a "CATEGORY" field of the rows?

    I hope you get the point and this helps. :)

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)

    PS: Je suis Charlie!

    PS: pretty sure someone will come up with a CPAN² module now... :)

    update

    ¹) for instance you could bless the hash above into a class My::Table . Then $obj->rows returns a list of names, and $obj->entry("Male","0-9") returns the entry, and so on.

    ²) like Data::Table

Re: Building data structure from multi-row/column table
by roboticus (Chancellor) on Apr 03, 2015 at 11:28 UTC

    rwx--:

    You could build your data structure in several different ways, but they'll all have tradeoffs. If you don't know how you're planning on accessing the data, you won't know which tradeoffs make sense. For example, you could structure it like:

    my %data = { 'CityA'=>{ 'PopByAge/Gender'=>{ 'Ages 0-9'=>{ 'Male' => 220, 'Female' => 180 }, ... }, 'FamilyHouseholds'=>{ ... }, }, ... };

    This structure would hold your data just fine. It would be a convenient structure if you know that you'll always have the city name before the table name, and table name before the row headers, etc., like so:

    print "What city? Choices are:\n", join(", ", sort keys %data), "\n>"; chomp($city = <>); my $hr = $data{$city}; print "What table? Choices are:\n", join(", ", sort keys %{$hr->{$city}}), "\n>"; chomp($table = <>); $hr = $hr->{$table}; print "Which group? Choices are:\n", join(", ", sort keys %{$hr->{$table}), "\n>"; chomp($grp = <>); $hr = $hr->{$grp}; ...

    But if you wanted to access by table name and *then* city, it's less convenient, as you have to trawl through your data to build the list:

    # Each city may have different tables, so trawl through all the # cities to find available table names my %tables; for my $city (keys %data) { $tables{$_}{$city}=$data{$k}{$_} for keys $data{$k}; } print "Which table? Choices are:\n", join(", ", sort keys %tables), ">\n"; chomp($table=<>); print "Which city? Choices are:\n", join(", ", sort keys %{$tables{$table}}), ">\n"; chomp($city=<>); ...

    You'd have similar compromises if you organized it like:

    my %data = { 'PopByAge/Gender'=>{ 'Ages 0-9'=>{ 'Male'=>{ 'CityA'=>220, 'CityB'=>... } } }, ... };

    At the cost of space, you could build it with multiple hierarchies:

    my %data = { 'Cities'=>{ 'CityA'=>{ 'PopByAge/Gender'=>{ 'Ages 0-9'=>{'Male' = 220,'Female' = 180}, ... },... }, 'Tables'=>{ 'PopByAge/Gender'=>{ 'Ages 0-9'=>{ 'CityA'=>{'Make'=>220, 'Female'=>180}, ... ... };

    But this gets unwieldy as you add different orders of questioning.

    Organizing your data in a hierarchy makes one set of questions really easy to ask, and others a bit less so. If you need to answer questions with no set hierarchy, you could flatten the data something like:

    my @data = [ { city=>'CityA', table=>'PopByAge/Gender', grpA=>'Age Group', grpB=>'Gender', A=>'0-9', B=>'Male', val=>220 +}, ... ];

    This not quite as friendly to build, and not as easy to query, but for some situations it's a reasonable compromise. You could let the user select whether they want to search by city, table, etc.:

    my @selected = ( @data ); while (1) { print "Search by city, table or group? (or D for done)"; chomp($key=<>); last if $key eq 'D'; my @choices = map { $_->{$key} } @data; print "Choices are: ", join(", ", sort @choices), "\n>"; } ... print report on selected data ...

    So rather than choose a data structure right now, find out how you want to access your data, and see how you can arrange your data to simplify things for yourself. If the user always gave me the city and table name at the same time, I'd choose a hierarchy because I personally find it easy. If the data were complex, and I wanted to keep it around, I'd use a database.

    Eh, I've run on too long again. I guess I'll stop here... (Note: no code in this node is syntax checked or tested. No animals were harmed during filming. Do not use unless prescribed by your doctor. Talk to your dentist.)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Your second-to-last paragraph should have been your first! Then it could have been the only paragraph .. which is not to say that I (and the OP) don't appreciate the time and effort you put into your examples.

      Dum Spiro Spero

        GotToBTru:

        You're right--I probably should've just stuck with that. Sometimes I start posting before I'm fully awake.

        Yeah, that's the ticket!

        ...roboticus

        That's my story, and I'm sticking to it.