Re^3: Struggling with complex data structures and doing useful operations on their elements and populating from arrays

Okay, it looks like nobody else will answer your question, and you won't answer mine.

In the interest of helping a future Hash/Array Perl newcomer who might want to see the problem from the same angle you are seeing it, here's the solution I would have worked you toward.

You can ask about anything here, but I'm not likely to check this thread anymore.

Hope this helps.

# small snippet to explore dealing with complex data structures and du
+plicates/uniques and consolidation
my %pets;
my @info;
#() below only signify that multiple elements possible in 3rd and 4th 
+elements of $info[i]
$info[0]="Mary,Owens,cat,white";
$info[1]="Bill,Thompson,(cat,dog),(white,black)";
$info[2]="Bill,Thompson,(hamster,cat),(black,brown)";
$info[3]="Bill,Smith,(goldfish,dog,turtle),(yellow,spotted,green)";

#how to organize this data and loop thru to populate %pets from @info 
+and extract output as below

# Loop through each line in the input array, split out the data elemen
+ts, and store them in the hash
foreach my $info_line (@info)
{
    # Peel off First and Last name.
    # WARNING:  We are dangerously assuming they will not be encapsula
+ted in parentheses.
    my ($firstName, $lastName, $afterName) = split /\,/, $info_line, 3
+;

    # I've seem some fancy-schmancy regex work which could do the (x,y
+,z) thing in one pass.
    # I've never learned the technique, and I'm too lazy to look it up
+.
    my ($petTypes, $afterPets)    = &parseNextElement($afterName);
    my ($petColors, $afterColors) = &parseNextElement($afterPets);

    # Now split up the pet types
    if ($petTypes =~ /\(([^\)]+)\)/)
    {
        my $innerPetTypes = $1;
        $petTypes = $innerPetTypes;
    }
    my @petTypeList = split /\,/, $petTypes;

    # Now split up the pet colors
    if ($petColors =~ /\(([^\)]+)\)/)
    {
        my $innerPetColors = $1;
        $petColors = $innerPetColors;
    }
    my @petColorList = split /\,/, $petColors;

    # Hash trick #14:  Using a hash to quietly remove duplicate entrie
+s
    # WARNING:  This is case-sensitive.
    #   Force the keys to upper (or lower) case to render it case-inse
+nsitive
    #     HINT:  Instead of storing $TRUE, store the first mixed-case 
+name as the value
    foreach my $petType (@petTypeList)
    {
        $pets{$firstName}{$lastName}{$PET_TYPES}{$petType} = $TRUE;
    }
    foreach my $petColor (@petColorList)
    {
        $pets{$firstName}{$lastName}{$PET_COLORS}{$petColor} = $TRUE;
    }
}

# Okay, the %pets hash is loaded.  Report on it any way you like.
# NOTE:  To make the code look cleaner at the bottom, we will be stori
+ng all data
#        pre-encapsulated in quotation marks.
foreach my $firstName (keys %pets)
{
    my $encapsulatedFirstName = "\"$firstName\"";

    # Tricky(ish):  The Last name is a sub-hash key.
    # NOTE:  This is often called a Hash of Hashes (HoH).
    foreach my $lastName (keys %{$pets{$firstName}})
    {
        my $encapsulatedLastName = "\"$lastName\"";

        # One more layer deep in our Hash of Hashes:  PET_TYPES is a s
+ubkey to the Last Name
        # And then another layer deep:  Each pet type is a subkey to t
+he PET_TYPES hash
        # NOTE:  I am doing this the hard way to slightly de-Perlize i
+t a bit as a mercy for any Hash noob trying to decipher this
        #        This *could* have been done with a single, complex-lo
+oking join statement
        my @petTypes = ();
        foreach my $petType (keys %{$pets{$firstName}{$lastName}{$PET_
+TYPES}})
        {
            push @petTypes, "\"$petType\"";
        }
        # PET_COLORS is a peer to PET_TYPES under Last Name, so handle
+ the same way
        my @petColors = ();
        foreach my $petColor (keys %{$pets{$firstName}{$lastName}{$PET
+_COLORS}})
        {
            push @petColors, "\"$petColor\"";
        }

        # And now we've normalized the data.  Pre-join the pet data.
        my $finalPetTypes  = join ',', @petTypes;
        my $finalPetColors = join ',', @petColors;

        # Drum roll, please:
        my $output_line = join ":", ($encapsulatedFirstName, $encapsul
+atedLastName, $finalPetTypes, $finalPetColors);
        print "$output_line\n";
    }
}
[download]

And here's the subroutine I used to try and make the operation a little more clear:

sub parseNextElement
{
     my ($input_line, @extra_stuff) = @_;
     if (!defined $input_line)          { $input_line = ''; }
     if (!defined $extra_stuff[0]) { @extra_stuff = (); }

     # Initialize return elements
     my $next_element = '';
     my $remaining_line = '';

     # Check if first element is parentheses-encapsulated
     if ($input_line =~ /^\s*\(/)
     {
          # First element looks like it has parenthetically-encapsulat
+ed sub-elements
          ($next_element, $remaining_line) = split /\)/, $input_line, 
+2;

          # This chops off the closing parenthesis from that first ele
+ment, so put it back
          $next_element .= ')';

          # It also leaves the comma on the remaining line, so remove 
+it
          # WARNING:  Making lots of assumptions here about proper for
+matting
          $remaining_line =~ s/^\s*\,//;
     }
     else
     {
          # First element is a solitary value
          ($next_element, $remaining_line) = split /\,/, $input_line, 
+2;
     }

     return ($next_element, $remaining_line);
}
[download]

I would also point out that in my copy, I wrote a series of unit tests into this routine, because it was just complex enough to warrant that level of care -- and those tests caught a bug I had introduced into the code. Root cause? Sheer arrogance. Good engineering practices protect us from many things, but most of all from ourselves.

Comment on Re^3: Struggling with complex data structures and doing useful operations on their elements and populating from arrays Select or Download Code