in reply to Struggling with complex data structures and doing useful operations on their elements and populating from arrays

Don't you think the requirements to remove duplicate animals and colors result in loss of connection between some individual animals and their color? In fact, brown cat is completely gone from Bill Thompson's results, so you do not only mess up the connections of the initial data, you lose some of that data. I'd start with producing requirements that make sense. Then figure out implementation.
  • Comment on Re: Struggling with complex data structures and doing useful operations on their elements and populating from arrays

Replies are listed 'Best First'.
Re^2: Struggling with complex data structures and doing useful operations on their elements and populating from arrays
by hiyall (Acolyte) on Mar 12, 2015 at 17:14 UTC

    Your point about the relationship about relating color to pet is valid in our real world. For the purposes of this exercise, I chose to intentionally ignore the relationship. For the parallel problem that I am working on, the relationship is not a concern and would interfere with a simple inventory of the elements of each array for the unique key of first/last name. I appreciate your insight and comment.

      Ok, I can imagine losing connection between an animal and its color as not significant for inventory purposes as long as every animal is accounted for. But how can losing a whole animal be insignificant? Just trying to understand the logic between your input and output data in this exercise. Let me make some assumptions that disambiguate, for me at least, the stated problem and suggest a data structure which could hopefully make task of populating it very easy without any information loss. Assuming that
    • owner's first and last name can somehow be considered a unique combination
    • and we do not want to use OO approach
    • and assuming we are forced to use current structure of @info array's elements
    • and @info array in your example means, e.g. as far as cats are concerned, that M.O. owns one white cat and B.T. owns one white cat and one brown cat
    • and we do not want to lose any data
    • and we want to allow any owner to have more than one animal of the same kind and of any color
    • and we want to be able to add or remove different kinds of data very easily (e.g. add day of birth or have multicolored animals)
    • and we want to be able to change output format very easily
    • then a reasonable structure for %pets will be, in my opinion, something like this:
      %pets = ( "Thompson" => { "Bill" => { dogs => [ { colors => ["black"], }, ], cats => [ { colors => ["brown"], }, { colors => ["white"], }, ], hamsters => [ { colors => ["black"] }, ], }, }, "Owens" => { "Mary" => { cats => [ { colors => ["white"], }, ], } }, );
      Loading @info into such %pets will not be difficult and %pets will allow you to produce a report in any format you want, so this is fairly flexible. I will not give code examples for these tasks due to time constraints and I should also say this hash structure is one of many approaches, but this one should work easily. It will allow you to extend the information in the future, e.g. add name, date of birth to each pet, height, weight, additional nicknames, additional colors, whatever you need. Hope this helps.
        I definitely agree that this data structure suggested by hotpelmen is most probably more appropriate to the problem description and I think that the OP should go for something along these lines. The problem, though, is that the input data is rather badly structured:
        $a="Mary":"Owens":"cat":"white"; $b="Bill":"Thompson":"cat,dog":"white,black"; $c="Bill":"Thompson":"hamster,cat":"black,brown"; $d="Bill":"Smith":"goldfish,dog,turtle":"yellow,spotted,green";
        Mary Owens, no problem.

        Bill Smith, we can guess that the goldfish is yellow, the dog spotted and the turtle green. But this is not very robust a data structure, and it would be much better if the input data had a clear relationship between the pets and the colors. For example something like that:

        $d="Bill":"Smith":"goldfish,yellow":"dog,spotted":"turtle,green";
        For Bill Thompson, we have the added difficulty that he has really a lot of pets and that they are described in two distinct lines, but that should not be too difficult to handle.

        My question to the hiyall: can you have a better structure, such as the one outlined just above or something similar, in your input data, or is your input data in a format that is outside your control? If you can change your input data, make it more robust. If it is not possible to change the input format, then it is not terribly complicated to still arrive to the final data structure suggested by hotpelmen, just a bit more tedious and error-prone.

        But my main point is that the input format is not very robust: what should your code do if you have three pets and four colors, or the other way around? The input data should presumably come from a source that is better organized, in which there is a direct relationship between the pet and the color. So it would be much better IMHO to have clear pet/color pairs in the input data.

        Also, if you want to change your application and allow for two colors for a pet (say a black and white dog), the change would be much easier to handle (just replace the color by a list of colors, the surrounding data structure does not need to be changed).

        Je suis Charlie.