Efficient Grouping

meetraz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Efficient Grouping by davorg (Chancellor) on Oct 29, 2002 at 16:35 UTC
I'd use more complex data structures which would make it more extensible. `#!/usr/bin/perl -w use strict; use Data::Dumper; my @Def = ( { name => 'Group1', code => [qw( H0 K0 PA PB PC PD PE PF PG PH )] }, { name => 'Group2', code => [qw( PX PY PZ P1 P2 P3 P4 P5 P6 P7 )] }); my %Codes; foreach (@Def) { @Codes{@{$_->{code}}} = ($_->{name}) x @{$_->{code}}; } my %Groups; while (<STDIN>) { chomp; push @{$Groups{$Codes{substr($_, 0, 2)}}}, $_; } print Dumper \%Groups;` [download] You end up with the partitioned data in `%Groups`. I tested it with this input file: `K0blah PZfoo P7bar PEbaz` [download] which gave the following output: `$VAR1 = { 'Group1' => [ 'K0blah', 'PEbaz' ], 'Group2' => [ 'PZfoo', 'P7bar' ] };` [download] -- <http://www.dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply] [d/l] [select]
Re: Efficient Grouping by nothingmuch (Priest) on Oct 29, 2002 at 16:24 UTC
You could use `@G1_Hash{ qw/HO KO.../ } = ();` [download] To make the assignment one step, but this won't be a significant change when comparing to a more efficient loop. You could make the loop more efficient by making your test cases ordered by probability, and then skip to the next line without testing it again with `push (@G1_out, $input), next if exists $G1_Hash{$prefix}` [download] The exists function is just to fix my prior laziness - assiging an empty list to the hash slice - which will result in the values being undefined. I don't know, but perhaps not testing for truth on the value may save a bit more... Due to the fact that the loop will probably be performed many times over, a little change can be multiplied by the number of times a step is saved. -nuffin zz zZ Z Z #!perl	[reply] [d/l] [select]
Re: Efficient Grouping by tommyw (Hermit) on Oct 29, 2002 at 16:41 UTC
You've got six `if`'s in the body of the loop, which will probably be more overhead than the setup, if you've got any substantial amount of data. Certainly, you're not going to get much benefit out of the various ways of initialising a 1200 element structure (although some ways may be more readable than others). `my (%hash, @G1_out, @G2_out, ...); $hash{$_}=\@G1_out for (qw (H0 ...)); $hash{$_}=\@G2_out for (qw (PX ...)); while (my $input = <STDIN>) { chomp $input; my $prefix=substr($input, 0, 2); push @{$hash{$prefix}}, $input; }` [download] -- Tommy Too stupid to live. Too stubborn to die.	[reply] [d/l] [select]
Re: Efficient Grouping by BrowserUk (Patriarch) on Oct 29, 2002 at 17:28 UTC
Whenever I see variables with names delineated by sequential numbers, I tend think "Data structure". Sometimes a hash, usually an array. In this case, you not only had the Groupnarrays, but the Gn_Hashes and the Gn_out arrays. These can all be grouped into a single data structure which makes for easy looping. The result is an AoH+A. This code Read more... (2 kB) Gives this output Read more... (999 Bytes) You'll need to add code to handle prefixes that aren't in any group if that is a possibility. Nah! Your thinking of Simon Templar, originally played by Roger Moore and later by Ian Ogilvy	[reply] [d/l] [select]
Re: Efficient Grouping by ides (Deacon) on Oct 29, 2002 at 16:17 UTC
You could populate it first like so: `my %G1_Hash = ( 'H0' => 1, 'KO' => 1, ... );` [download] ----------------------------------- Frank Wiles <frank@wiles.org> http://frank.wiles.org	[reply] [d/l]
Re: Efficient Grouping by LTjake (Prior) on Oct 29, 2002 at 16:40 UTC
Just to add my $0.02. I didn't use a hash at all, I used grep. use strict; my @Group1 = qw( H0 K0 PA PB PC PD PE PF PG PH ); my @Group2 = qw( PX PY PZ P1 P2 P3 P4 P5 P6 P7 ); my @G1_out; my @G2_out; while (my $input = <DATA>) { chomp ($input); my $prefix = substr($input,0,2); # NB: grep is slow in this case. evil. beware. push (@G1_out, $input) if grep($_ eq $prefix, @Group1); push (@G2_out, $input) if grep($_ eq $prefix, @Group2); } print "G1\n"; print "$_\n" foreach @G1_out; print "\nG2\n"; print "$_\n" foreach @G2_out; __DATA__ A1 # invalid K0 # valid G1 B4 # invalid PY # valid G2 [download] Gives: `G1 K0 # valid G1 G2 PY # valid G2` [download] Update: I guess i should've mentioned that i knew it was slower. The bonus I saw is that it doesn't require a hash per group, which i think is a good thing. I guess my priorities lie elsewhere =) My bad. -- Rock is dead. Long live paper and scissors!	[reply] [d/l] [select]
Re: Re: Efficient Grouping by Thelonius (Priest) on Oct 29, 2002 at 18:33 UTC
Just to clarify, this `push (@G1_out, $input) if grep($_ eq $prefix, @Group1);` [download] is a bad idea. It's much slower than the original. The hash solutions offered by davorg or tommyw are good except if the possibility exists that the groups are not disjoint. The example given did not overlap, but he did not explicitly say it was impossible.	[reply] [d/l]
Re: Efficient Grouping by meetraz (Hermit) on Oct 29, 2002 at 18:59 UTC
Thanks for everybody's great suggestions. Just to clear things up, there are no overlap between the groups. Each 2-digit combination can only appear in one group. Although, not every 2-digit combo will be represented. It would be helpful to handle these rogue entries. The input file is currently around 5-6k lines.	[reply]
Re: Re: Efficient Grouping by Thelonius (Priest) on Oct 29, 2002 at 19:56 UTC
It would be helpful to handle these rogue entries. Well, then, to modify tommyw's code: `use strict; my (%hash, @G1_out, @G2_out); my @rogues; $hash{$_}=\@G1_out for qw(H0 H1); # etc $hash{$_}=\@G2_out for qw(PX P2); while (<>) { chomp; push @{$hash{substr($_, 0, 2)} \|\| \@rogues}, $_; }` [download]	[reply] [d/l]
Re: Efficient Grouping by RMGir (Prior) on Oct 29, 2002 at 16:39 UTC
You round up a cleric, an enchanter, a heavy tank-type, and maybe a bard.... Ooops, this isn't EQ Monks :) -- Mike	[reply]