normalizing results from a db query

punkish has asked for the wisdom of the Perl Monks concerning the following question:

I asked this on HTML::Template list, but am asking here as well because (well, no reply from H::T list yet, and I am in a hurry, ;-) , but also because of an additional general question). I am drawing a blank on how to accomplish the following (simplified example provided below) --

from a database query involving joins, I have the following array (actually an array of hashes)

1,gm,saturn
1,gm,chevy
1,gm,caddy
2,ford,taurus
2,ford,escort
3,vw,bug
3,vw,jetta
3,vw,phaeton
..

In order to display

1. gm
	- saturn
	- chevy
	- caddy
2. ford
	- taurus
	- escort
..

I can do the following code

#!/usr/bin/perl -w

use strict;

my $seen = 0;

while (<DATA>) {
  chop;
  my ($id, $maker, $make) = split(",", $_);
  if ($id != $seen) {
    print "$id. $maker\n";
    print "   - $make\n";
    $seen = $id;
  } else {
    print "   - $make\n";
  }
}

__DATA__
1,gm,saturn
1,gm,chevy
1,gm,caddy
2,ford,taurus
2,ford,escort
3,vw,bug
3,vw,jetta
3,vw,phaeton
[download]

But I need to use this with H::T.

So, I really need an array of hashrefs. I really need something like --

my @carmakers = (
  \%thismaker,
  \%thismaker,
  \%thismaker
);
[download]

where

my %thismaker = ('thismaker' => 'gm', 'makes' => \@makes);
[download]

where

my @makes = (\%thismake, \%thismake, \%thismake);
[download]

where

my %thismake = ('thismake' => 'saturn');
my %thismake = ('thismake' => 'chevy');
my %thismake = ('thismake' => 'caddy');
[download]

in turn for each maker and within each maker for each make.

This will allow me to display the following --

<tmpl_loop carmakers>
    <tmpl_var thismaker>
    <tmpl_loop makes>
        <tmpl_var thismake>
    </tmpl_loop>
</tmpl_loop>
[download]

Applying my logic as above doesn't work

my $seen = 0;

my @carmakers;
while (<DATA>) {
  chop;
  my ($id, $maker, $make) = split(",", $_);
  if ($id != $seen) {
    my %maker;
    $maker{'thismaker'} = $maker;
      my @makes;
      my %make = ('thismake' => $make);
      push(@makes, \%make);
    $maker{'makes'} = \@makes;
    push(@carmakers, \%maker);  # problem1
    $seen = $id;
  } else {   
      my %make = ('thismake' => $make);
      push(@makes, \%make);  # problem2
    $maker{'makes'} = \@makes;
  }
}
[download]

But, this is where I run into my logic problem. The line marked "# problem1" should not occur unless a new car maker is encountered. And the line marked "# problem2" fails because it does not see the @makes declared in the same block but in the first conditional.

Which brings me to a more general problem. Under "use strict" how do I access a variable declared in a block that is at the same level as the current block? In other words, if I have

while (looping over something) {
  if (cond1) {
    create @foo if a certain condition is met
    do a bunch of things
  } else {
    do something to foo already declared
    do a bunch of other things
  }
}
[download]

_{janitored by ybiC: Increased scope of existing <readmore> tags}

Comment on normalizing results from a db query Select or Download Code

Replies are listed 'Best First'.
Re: normalizing results from a db query by jdtoronto (Prior) on Jan 25, 2004 at 22:16 UTC
I have to admit that I found a lazy way to do just this - Data::Grouper. After four hours trying to do what you are doing, I decided that a nice general purpose module would be very useful. jdtoronto	[reply]
Re: Re: normalizing results from a db query by punkish (Priest) on Jan 25, 2004 at 23:48 UTC
Great! this is exactly what I wanted to do. Now, I am going to read Data::Grouper's code to figure out what it is doing behind the scene. Thanks for mucking around with this for 4 hours. I just wasted my entire weekend on this. I would still like to know how to access vars named in horizontally same level blocks under "use strict."	[reply]
Re: normalizing results from a db query by biosysadmin (Deacon) on Jan 25, 2004 at 21:48 UTC
I don't think I understand your problem. In your else block you won't have access to the foo declared in the if (cond1) statement because the code in the if block has never been executed. Have you declared foo outside the while loop? Or are you looking to do something with an empty array foo? If the latter, try this: `while (looping over something) { my @foo; if (cond1) { create @foo if a certain condition is met do a bunch of things } else { do something to foo already declared do a bunch of other things } }` [download]	[reply] [d/l]
Re: Re: normalizing results from a db query by biosysadmin (Deacon) on Jan 26, 2004 at 06:45 UTC
I have a better understanding of your problem now, here's the code for my solution to problem #2. It's a bit hackish, but I thought I'd throw it out as one way to do it: #!/usr/bin/perl -w use strict; use Data::Dumper; my $seen = 0; my @carmakers; while (<DATA>) { chop; my ($id, $maker, $make) = split(",", $_); if ($id != $seen) { package block1; my %maker; $maker{'thismaker'} = $maker; my @makes; my %make = ('thismake' => $make); push(@makes, \%make); $maker{'makes'} = \@makes; push(@carmakers, \%maker); # problem1 $seen = $id; } else { package block2; my %make = ('thismake' => $make); push(@block1::makes, \%make); # problem2 $block1::maker{'makes'} = \@block1::makes; } } print Dumper( @carmakers ); __DATA__ 1,gm,saturn 1,gm,chevy 1,gm,caddy 2,ford,taurus 2,ford,escort 3,vw,bug 3,vw,jetta 3,vw,phaeton [download] I'll think some more about problem 1, this is fun. :) Have you thought about a different way of structuring your database quer(y\|ies)? For example, maybe you could use the database to get a unique set of carmakers.	[reply] [d/l]
Re: Re: normalizing results from a db query by punkish (Priest) on Jan 25, 2004 at 21:56 UTC
well, that is precisely the problem. Here are more annotations -- `while (looping over something) { if (cond1) { create @foo if cond1 is met do a bunch of things, and go to the next iteration in the loop } else { since cond1 is not met, don't create a new @foo but do something to @foo already declared in the prev iteration do other things, and go to the next iteration in the loop } }` [download] This makes sense if you look at the context of my question. I am looping over a result set that has duplicates (carmakers with their makes). I want to eliminate the display of duplicate carmakers, but remember them in successive iterations so I can associate all their makes correctly to them.	[reply] [d/l]
Re: normalizing results from a db query by Ctrl-z (Friar) on Jan 25, 2004 at 23:46 UTC
cant say i fully get where your going with this, but it looks a lot like a similar problem i had; turning a flat table into a nested loop, grouped by a particular field in the table?. Heres the gist of what I did my @tabular = $dbi->array_of_hashrefs_please(); # move into a hash (keyed by category) of array of hashrefs (rows) my %categories; foreach( @tabular ) { my $entry = $_->{ FIELD_I_WANT_TO_SORT_BY }; if ( $categories{$entry} == undef ) { $categories{$entry} = [$_] ; } else { push @{$categories{$entry}}, $_; } } # now create nested loop based on category... my @nested_loop; foreach(sort keys %categories) { push @nested_loop, { category => $_, entries => $categories{$_} +}; } # add our nested loop to the template $template->param( my_loop => \@nested_loop); [download] ...with a tmpl something like... `<tmpl_loop name="my_loop"> <tmpl_var name="category"> <tmpl_loop name="entries"> <!-- inner loop data here --> </tmpl_loop> </tmpl_loop>` [download] hope its of some use time was, I could move my arms like a bird and...	[reply] [d/l] [select]
Re: normalizing results from a db query by exussum0 (Vicar) on Jan 26, 2004 at 02:48 UTC
Normalizing is the procedure of removing repeated values and storing them in a seperate table. For data result sets, you can't easily worry about this for one reason. Your result set is a snapshot of your data (if it is constantly changing). Doing it in two queries/two result tables, getting one smaller table to and then again for the rest of your data would be logically nicer. You could generate a dictionary from your first query, and during the result phase, reassociate it to your second query. But you know why you don't usually ever do this? It's faster most of the time. to do it once and let the database weed out what wouldn't match in the first place. It's O(f()) * 2 if your queries are equaly complex. That's bad. Dividing in two is always good, thus do it only once. If you were joining on something like, a reference table, it may be faster to keep a constant structure in perl to re-associate back to, in dictionary/hash form. Other than that, it's nicer to get back one result set and proecss it in a neat algorithm, like you tried to do. --- On to part 2. While your exact code may or may not work, it is the right way of doing things. If your first field changes, just create a new indentation with a new title. This is similar to how SAX works. When you see a close tag (or an open one) change the state of the process to behave a little differently. The particular problem with your code, is that you create variables scoped for blocks and expect them to cross boundaries. There are tricks, such as creating the reference and then saving it somewhere else, but in this case, i wouldn't advise it. My advice would be to create a funciton that at least takes the data in, processes it, and then returns back the structure you like. In this case, in pseudo code, I would try something like this. `my @track = (); my %demo = (); while(my @data = $sth->fetchrow_array()) { for( my $x = 0; $x < $#data ; $x++ ) { if( $data[$x] ne $track[$x] ) { print "\t" x $x; print "$data[$x]\n"; } @track = @data; } push( @{$demo{ $data[0] }{ $data[1] }{ $data[2] } .... = $data[widt +h]; }` [download] The code won't compile and prolly has a logic error or two, but it should get you started. An exercise best left to the user if you don't like your code. Play that funky music white boy..	[reply] [d/l]
Re: normalizing results from a db query by punkish (Priest) on Jan 26, 2004 at 06:05 UTC
Thanks to everyone who responded. I learned a lot from the suggested code fragments, as well as from Data::Grouper. However, none of these were meeting my needs. Specifically, I wanted to specify more than one fields in both outer and inner loops. Anyway, I finally cracked it, and since I spent two days staring at the screen trying to figure it out, I offer the following with the hope that some other lost soul might find it timesaving and of use. Assuming I have @res as below (returned from a $sth->fetchall_arrayref({})) id,maker,make =============== 1,gm,saturn 1,gm,chevy 1,gm,caddy 2,ford,taurus 2,ford,escort 3,vw,bug 3,vw,jetta 3,vw,phaeton as an array of hashrefs (each row is a hash). And I want a normalized array of hashrefs with nested loops (suitable for H::T). I wrote the following that works (please do suggest any improvements I might be able to make to this) -- my $groupby = 'id'; # field to group by my @outerfields = ('maker'); # outer loop fields my @innerfields = ('make'); # inner loop fields my $cars = &normalize(\@res, $groupby, \@outerfields, \@innerfields); $template->param(cars => $cars); # print it all print "Content-Type: text/html\n\n", $template->output; sub normalize($aref, $groupby, $outerfields, $innerfields) { my ($aref, $groupby, $outerfields, $innerfields) = @_; my @res = @$aref; my @outerfields = @$outerfields; my @innerfields = @$innerfields; my $seen = 0; # a flag to remember seen records my @temp; # a temp array to store outer loop fields my %makes; # a hash to store all the makes keyed by the grou +pby field # loop over the denormalized results foreach (@res) { # using the groupby field, test if the record has been seen... if ($_->{$groupby} != $seen) { # create a hash to store the outer loop fields, the car maker my %q; $q{$groupby} = $_->{$groupby}; foreach my $of (@outerfields) { $q{$of} = $_->{$of}; } # stuff the outer loop fields hash into the temp array push(@temp, \%q); # create a hash to store the inner loop fields, the car makes my %make; foreach my $if (@innerfields) { $make{$if} = $_->{$if}; } # add the inner loop fields hash to the makes hash keyed by the +groupby field push(@{$makes{$_->{$groupby}}}, \%make); # update the flag $seen = $_->{$groupby}; # since the record has already been seen... } else { # create a hash to store the inner loop fields, the car makes my %make; foreach my $if (@innerfields) { $make{$if} = $_->{$if}; } # add the inner loop fields hash to the makes hash keyed by the +groupby field push(@{$makes{$_->{$groupby}}}, \%make); } } my @cars; # an array to hold it all in its final form # loop over the temp array created above foreach (@temp) { # grab the makes for this maker using the groupby field and # add it to this row as an array of hashrefs $_->{'makes'} = $makes{$_->{$groupby}}; # add the entire row, a hashref, to the final array push(@cars, $_); } return \@cars; # return back to the code } [download] Meanwhile, in my html template `<tmpl_loop cars> <tmpl_var id>. <tmpl_var maker> <tmpl_loop makes> <tmpl_var make> </tmpl_loop> </tmpl_loop>` [download]	[reply] [d/l] [select]