Re: Struggling with complex data structures and doing useful operations on their elements and populating from arrays
by hdb (Monsignor) on Mar 12, 2015 at 14:18 UTC
|
The issue is to split into fields but keep those grouped by parantheses together. For comma-separated data it is always advisable to use a module like Text::CSV rather that splitting yourself. Text::CSV cannot directly treat parantheses but if you replace them with double quotes it should work fine.
UPDATE: here is some sample code:
use strict;
use warnings;
use Text::CSV;
my %pets;
my @info;
$info[0]="Mary,Owens,cat,white";
$info[1]="Bill,Thompson,(cat,dog),(white,black)";
$info[2]="Bill,Thompson,(hamster,cat),(black,brown)";
$info[3]="Bill,Smith,(goldfish,dog,turtle),(yellow,spotted,green)";
s/[()]/"/g for @info;
my $csv = Text::CSV->new();
for (@info) {
$csv->parse( $_ );
print join ":", map { "\"$_\"" } $csv->fields();
print "\n";
}
| [reply] [d/l] |
|
|
If you're considering the module approach, I would also check out Text::CSV_XS. I have heard it has more flexibility in a lot of places like delimiters and such, is possibly more reliable, and allegedy much faster.
| [reply] [d/l] |
|
|
Thank you for your reply, I am hoping for a non-module assist as I am trying to figure out the subtleties of multidimensional hashes of multiple scalars and arrays. In the literature that I have the examples are hashes of arrays or hashes ... I have not grasped this fully yet.
| [reply] |
|
|
Re: Struggling with complex data structures and doing useful operations on their elements and populating from arrays
by GotToBTru (Prior) on Mar 12, 2015 at 14:34 UTC
|
You define a hash %pets but never use it, nor does your desired output suggest a printed hash. Also, Bill Thompson has two cats, one white and one brown, but your output has only one.
It will help us help you if you can be more consistent in your problem definition.
| [reply] |
|
|
Apologies
The %pets should be built from a loop thru the @info. The desired structure of the %pets is {scalar}{scalar}{array}{array} with no duplicated element in the {array} fields. The serves to summarize the data with no duplicates.
The desired output would then to list the contents of each $pets{first}{last}{species}{color} on its own line, the combination {first}{last} is unique for the hash. The 2 arrays should be populated such that duplicate elements are discarded. This desired summarization is reflected in the desired output example. I have struggled with this little task for a couple of days now - my expertise is not quite sufficient enough yet ... but soon
| [reply] |
|
|
FYI, about the uniqueness in the array and sorting algorithm, you can use a prolog-like system to build your %pets. The trick is to negate clauses which are true so that your system inferes faster. It's a bit over the top but here's the stuff on CPAN : https://metacpan.org/search?q=prolog
| [reply] |
|
|
| [reply] |
Re: Struggling with complex data structures and doing useful operations on their elements and populating from arrays
by marinersk (Priest) on Mar 12, 2015 at 14:11 UTC
|
# How to loop through the second layer: What is in each element of @i
+nfo?
print "\$info:\n";
foreach my $info_item (@info)
{
#print " []='$info_item'\n";
# Let's try a simple split command:
my @info_fields = split /\,/, $info_item;
print "\$info_field:\n";
foreach my $info_field (@info_fields)
{
print " []='$info_field'\n";
}
}
Results show us that this simple approach is not sufficient to match the complexity of the data structure:
D:\PerlMonks>lists3.pl
$info:
$info_field:
[]='Mary'
[]='Owens'
[]='cat'
[]='white'
$info_field:
[]='Bill'
[]='Thompson'
[]='(cat'
[]='dog)'
[]='(white'
[]='black)'
$info_field:
[]='Bill'
[]='Thompson'
[]='(hamster'
[]='cat)'
[]='(black'
[]='brown)'
$info_field:
[]='Bill'
[]='Smith'
[]='(goldfish'
[]='dog'
[]='turtle)'
[]='(yellow'
[]='spotted'
[]='green)'
| [reply] [d/l] [select] |
|
|
#!/usr/bin/perl -w
use strict;
# small snippet to explore dealing with complex data structures and du
+plicates/uniques and consolidation
my %pets;
my @info;
#() below only signify that multiple elements possible in 3rd and 4th
+elements of $info[i]
$a="Mary":"Owens":"cat":"white";
$b="Bill":"Thompson":"cat,dog":"white,black";
$c="Bill":"Thompson":"hamster,cat":"black,brown";
$d="Bill":"Smith":"goldfish,dog,turtle":"yellow,spotted,green";
push @info,$a,$b,$c,$d;
#how to organize this data and loop thru to populate %pets from @info
+and extract output as below
#desire to loop through @info and populate %pets with structure for pe
+ts hash
#$pets{first}{last}{species}{color}
#with {first} and {last} containing scalars
#with {species} and {color} containing arrays
exit;
# Desired output from printing %pets: single line with unique elements
+ for the species and color arrays and fields separated by ":"
"Mary":"Owens":"cat":"white"
"Bill":"Thompson":"cat","dog","hamster":"white","black","brown"
"Bill":"Smith":"goldfish","dog","turtle":"yellow","spotted","green"
| [reply] [d/l] |
|
|
| [reply] |
Re: Struggling with complex data structures and doing useful operations on their elements and populating from arrays
by marinersk (Priest) on Mar 12, 2015 at 14:07 UTC
|
# How to loop through the first layer: What is in @info?
print "\$info:\n";
foreach my $info_item (@info)
{
print " []='$info_item'\n";
}
Results:
D:\PerlMonks>lists2.pl
$info:
[]='Mary,Owens,cat,white'
[]='Bill,Thompson,(cat,dog),(white,black)'
[]='Bill,Thompson,(hamster,cat),(black,brown)'
[]='Bill,Smith,(goldfish,dog,turtle),(yellow,spotted,green)'
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
| [reply] |
|
|
Okay, it looks like nobody else will answer your question, and you won't answer mine.
In the interest of helping a future Hash/Array Perl newcomer who might want to see the problem from the same angle you are seeing it, here's the solution I would have worked you toward.
You can ask about anything here, but I'm not likely to check this thread anymore.
Hope this helps.
And here's the subroutine I used to try and make the operation a little more clear:
I would also point out that in my copy, I wrote a series of unit tests into this routine, because it was just complex enough to warrant that level of care -- and those tests caught a bug I had introduced into the code. Root cause? Sheer arrogance. Good engineering practices protect us from many things, but most of all from ourselves.
| [reply] [d/l] [select] |
Re: Struggling with complex data structures and doing useful operations on their elements and populating from arrays
by hotpelmen (Scribe) on Mar 12, 2015 at 16:22 UTC
|
Don't you think the requirements to remove duplicate animals and colors result in loss of connection between some individual animals and their color? In fact, brown cat is completely gone from Bill Thompson's results, so you do not only mess up the connections of the initial data, you lose some of that data. I'd start with producing requirements that make sense. Then figure out implementation. | [reply] |
|
|
Your point about the relationship about relating color to pet is valid in our real world. For the purposes of this exercise, I chose to intentionally ignore the relationship. For the parallel problem that I am working on, the relationship is not a concern and would interfere with a simple inventory of the elements of each array for the unique key of first/last name.
I appreciate your insight and comment.
| [reply] |
|
|
Ok, I can imagine losing connection between an animal and its color as not significant for inventory purposes as long as every animal is accounted for. But how can losing a whole animal be insignificant? Just trying to understand the logic between your input and output data in this exercise.
Let me make some assumptions that disambiguate, for me at least, the stated problem and suggest a data structure which could hopefully make task of populating it very easy without any information loss.
Assuming that
- owner's first and last name can somehow be considered a unique combination
- and we do not want to use OO approach
- and assuming we are forced to use current structure of @info array's elements
- and @info array in your example means, e.g. as far as cats are concerned, that M.O. owns one white cat and B.T. owns one white cat and one brown cat
- and we do not want to lose any data
- and we want to allow any owner to have more than one animal of the same kind and of any color
- and we want to be able to add or remove different kinds of data very easily (e.g. add day of birth or have multicolored animals)
- and we want to be able to change output format very easily
then a reasonable structure for %pets will be, in my opinion, something like this:
%pets = (
"Thompson" => {
"Bill" => {
dogs => [
{
colors => ["black"],
},
],
cats => [
{
colors => ["brown"],
},
{
colors => ["white"],
},
],
hamsters => [
{
colors => ["black"]
},
],
},
},
"Owens" => {
"Mary" => {
cats => [
{
colors => ["white"],
},
],
}
},
);
Loading @info into such %pets will not be difficult and %pets will allow you to produce a report in any format you want, so this is fairly flexible. I will not give code examples for these tasks due to time constraints and I should also say this hash structure is one of many approaches, but this one should work easily. It will allow you to extend the information in the future, e.g. add name, date of birth to each pet, height, weight, additional nicknames, additional colors, whatever you need. Hope this helps. | [reply] [d/l] |
|
|