edan has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have an algorithm question here. Here's the scoop:

I am looping to build a list of records. I need to group certain records together, creating a "parent" record that comes before the group of sub-records. The grouping is done by a certain "ID" in each record that indicates the "group" that it is part of. The problem is boundary conditions and the awful cut-and-paste of a bunch of code in order to catch the last set:

Here's some psuedo-code that illustrates what I'm doing:

my @complete_list = (); my @sub_list = (); my $cur_id = 0; for my $rec ( @records ) { my $this_id = $rec->{ID}; # lots of other attributes here... if ( $this_id != $cur_id ) { # we crossed an ID boundary: # push a "heading" record on to the list if there # is more than one record in the sub list if ( @sub_list > 1 ) { # maybe lots of code here to build the record, # using all those attributes... push @complete_list, { # ... some record }; } # push the sub-list onto the main list and emtpy it out push @complete_list, @sub_list; @sub_list = (); } # accumulate records in the sub_list push @sub_list, { # ... some record }; # keep track of the "current" (last) ID to see when it changes $cur_id = $this_id; } #################### # here is the boundary-condition where we need to # take care of the last sub_list # cut-and-paste copy of the code inside the # ID-changed test: "if ( $this_id != $cur_id )" in the loop # YUCK! if ( @sub_list > 1 ) { # maybe lots of code here to build the record, # using all those attributes... push @complete_list, { # ... some record }; } push @complete_list, @sub_list; # all done return \@complete_list;

Hopefully my comments in the code make it clear what I don't like: I have to cut-and-paste the whole chunk of code that does the "grouping" after the loop in order to catch the last "group".

Of course the most obvious solution to avoid cut-and-paste is to put that code into a subroutine. That is probably what I'll do in the end. I'll have to pass it quite a few arguments, since it needs the record, a reference to the whole big list "\@complete_list", a reference to the sub_list, the "previous id" ($cur_id), and maybe some other stuff.

I'm hoping there's a nicer way to code this so that I can handle the boundary condition more elegantly, and not have to factor out the grouping code into a subroutine. For some reason, that way is eluding my mind.

Can someone please hit me over the head with my now-dusty "Introduction to Algorithms" textbook and enlighten me? Please?

--
edan

Replies are listed 'Best First'.
Re: Looping and Grouping: there must be a better way!
by blokhead (Monsignor) on Jul 15, 2004 at 07:50 UTC
    I would put the records into groups, and then loop through the groups, each as a single entity to decide if it needs a header record. For you, the trick is preserving order so we can build the header record using data about the previous group. Luckily there's Tie::IxHash:
    use Tie::IxHash; tie my %group, Tie::IxHash; ## group them by ID, preserving the order seen for (@records) { my $id = $_->{ID}; push @{ $group{$id} }, $_; } ## put special header at beginning of groups larger than 1 my $last_id; while (my ($curr_id, $records) = each %group) { unshift @$records, { ... } if @$records > 1; $last_id = $curr_id; } ## flatten everything out my @complete_list = map @$_, values %group;
    Notice when you put the header record onto the beginning of the group, you can use $last_id, $curr_id, $records, and even $group{$last_id} to generate it.

    blokhead

Re: Looping and Grouping: there must be a better way!
by hawtin (Prior) on Jul 15, 2004 at 07:34 UTC

    Is there some reason for not putting the "grouping" code in a subroutine?

    My personal bias would be to scope the array over a pair of subroutines that playe with it

    ... if ( @sub_list > 1 ) { # maybe lots of code here to build the record, # using all those attributes... add_record(# some record ); } ... if ( @sub_list > 1 ) { # maybe lots of code here to build the record, # using all those attributes... add_record(# some record ); } .... { # Private scope for various things my @complete_list = (); sub add_record { my($record) = @_; ... push @complete_list, $record; } sub find_record { my($index) = @_; # Do some checking for validity return $complete_list[$index]; } # End the private scope }
Re: Looping and Grouping: there must be a better way!
by graff (Chancellor) on Jul 16, 2004 at 03:37 UTC
    In addition to putting the complicated code blocks into subroutines, as suggested by hawtin (and as you said you would probably do -- yes you should do that), you might also consider using a more appropriate data structure -- perhaps a hash of arrays, where the "grouping ID" is the hash key, and each element of the hash is an array of records in that group.

    That way, you don't need to keep track of when the grouping ID changes from one record to the next -- just determine what that ID is, use it as a hash key, and push the record (after doing all the complicated manipulations) onto the hash element:

    my %complete_list; my @groupID_order; for my $rec ( @records ) { # do prep-work on $rec as needed to create returnable rec, then: my $groupID = $rec->{ID} push @groupID_order, $groupID unless exists( $complete_list{$group +ID} ); push @{$complete_list{$groupID}}, $usable_rec; } my @return_array; for my $groupID ( @groupID_order ) { # maybe you want to work on creating header stuff here for each gr +oup, then: push @return_array, $header_rec, @{$complete_list{$groupID}}; } return \@return_array;
    That's just untested pseudo-code, of course, but in theory it should produce the same result as your original approach.