Re: Efficient Grouping
by davorg (Chancellor) on Oct 29, 2002 at 16:35 UTC
|
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my @Def = ( { name => 'Group1',
code => [qw( H0 K0 PA PB PC PD PE PF PG PH )] },
{ name => 'Group2',
code => [qw( PX PY PZ P1 P2 P3 P4 P5 P6 P7 )] });
my %Codes;
foreach (@Def) {
@Codes{@{$_->{code}}} = ($_->{name}) x @{$_->{code}};
}
my %Groups;
while (<STDIN>) {
chomp;
push @{$Groups{$Codes{substr($_, 0, 2)}}}, $_;
}
print Dumper \%Groups;
You end up with the partitioned data in %Groups. I tested it with this input file:
K0blah
PZfoo
P7bar
PEbaz
which gave the following output:
$VAR1 = {
'Group1' => [
'K0blah',
'PEbaz'
],
'Group2' => [
'PZfoo',
'P7bar'
]
};
--
<http://www.dave.org.uk>
"The first rule of Perl club is you do not talk about
Perl club." -- Chip Salzenberg
| [reply] [d/l] [select] |
Re: Efficient Grouping
by nothingmuch (Priest) on Oct 29, 2002 at 16:24 UTC
|
@G1_Hash{ qw/HO KO.../ } = ();
To make the assignment one step, but this won't be a significant change when comparing to a more efficient loop. You could make the loop more efficient by making your test cases ordered by probability, and then skip to the next line without testing it again with push (@G1_out, $input), next if exists $G1_Hash{$prefix}
The exists function is just to fix my prior laziness - assiging an empty list to the hash slice - which will result in the values being undefined. I don't know, but perhaps not testing for truth on the value may save a bit more...
Due to the fact that the loop will probably be performed many times over, a little change can be multiplied by the number of times a step is saved.
-nuffin zz zZ Z Z #!perl | [reply] [d/l] [select] |
Re: Efficient Grouping
by tommyw (Hermit) on Oct 29, 2002 at 16:41 UTC
|
You've got six if's in the body of the loop, which will probably be more overhead than the setup, if you've got any substantial amount of data. Certainly, you're not going to get much benefit out of the various ways of initialising a 1200 element structure (although some ways may be more readable than others).
my (%hash, @G1_out, @G2_out, ...);
$hash{$_}=\@G1_out for (qw (H0 ...));
$hash{$_}=\@G2_out for (qw (PX ...));
while (my $input = <STDIN>) {
chomp $input;
my $prefix=substr($input, 0, 2);
push @{$hash{$prefix}}, $input;
}
--
Tommy
Too stupid to live.
Too stubborn to die.
| [reply] [d/l] [select] |
Re: Efficient Grouping
by BrowserUk (Patriarch) on Oct 29, 2002 at 17:28 UTC
|
Whenever I see variables with names delineated by sequential numbers, I tend think "Data structure". Sometimes a hash, usually an array. In this case, you not only had the Groupnarrays, but the Gn_Hashes and the Gn_out arrays. These can all be grouped into a single data structure which makes for easy looping.
The result is an AoH+A.
This code
Gives this output
You'll need to add code to handle prefixes that aren't in any group if that is a possibility.
Nah! Your thinking of Simon Templar, originally played by Roger Moore and later by Ian Ogilvy
| [reply] [d/l] [select] |
Re: Efficient Grouping
by ides (Deacon) on Oct 29, 2002 at 16:17 UTC
|
my %G1_Hash = (
'H0' => 1,
'KO' => 1,
...
);
-----------------------------------
Frank Wiles <frank@wiles.org>
http://frank.wiles.org
| [reply] [d/l] |
Re: Efficient Grouping
by LTjake (Prior) on Oct 29, 2002 at 16:40 UTC
|
Just to add my $0.02. I didn't use a hash at all, I used grep.
use strict;
my @Group1 = qw( H0 K0 PA PB PC PD PE PF PG PH );
my @Group2 = qw( PX PY PZ P1 P2 P3 P4 P5 P6 P7 );
my @G1_out;
my @G2_out;
while (my $input = <DATA>) {
chomp ($input);
my $prefix = substr($input,0,2);
# NB: grep is slow in this case. evil. beware.
push (@G1_out, $input) if grep($_ eq $prefix, @Group1);
push (@G2_out, $input) if grep($_ eq $prefix, @Group2);
}
print "G1\n";
print "$_\n" foreach @G1_out;
print "\nG2\n";
print "$_\n" foreach @G2_out;
__DATA__
A1 # invalid
K0 # valid G1
B4 # invalid
PY # valid G2
Gives:
G1
K0 # valid G1
G2
PY # valid G2
Update: I guess i should've mentioned that i knew it was slower. The bonus I saw is that it doesn't require a hash per group, which i think is a good thing. I guess my priorities lie elsewhere =) My bad.
-- Rock is dead. Long live paper and scissors! | [reply] [d/l] [select] |
|
|
push (@G1_out, $input) if grep($_ eq $prefix, @Group1);
is a bad idea. It's much slower than the original.
The hash solutions offered by davorg or tommyw are good except if the possibility exists that the groups are not disjoint. The example given did not overlap, but he did not explicitly say it was impossible.
| [reply] [d/l] |
Re: Efficient Grouping
by meetraz (Hermit) on Oct 29, 2002 at 18:59 UTC
|
| [reply] |
|
|
It would be helpful to handle these rogue entries.
Well, then, to modify tommyw's code:
use strict;
my (%hash, @G1_out, @G2_out);
my @rogues;
$hash{$_}=\@G1_out for qw(H0 H1); # etc
$hash{$_}=\@G2_out for qw(PX P2);
while (<>) {
chomp;
push @{$hash{substr($_, 0, 2)} || \@rogues}, $_;
}
| [reply] [d/l] |
Re: Efficient Grouping
by RMGir (Prior) on Oct 29, 2002 at 16:39 UTC
|
You round up a cleric, an enchanter, a heavy tank-type, and maybe a bard....
Ooops, this isn't EQ Monks :)
--
Mike | [reply] |