So I was thinking how I would do that and came up with this (MySQL):
drop table if exists clusters;
create table clusters
(
factor char(1) not null,
sequence varchar(10) not null,
num integer unsigned not null,
index (sequence)
);
insert into clusters values
('C', 'seq1', 25),
('A', 'seq1', 20),
('E', 'seq2', 45),
('D', 'seq2', 30),
('B', 'seq2', 25),
('B', 'seq1', 40),
('B', 'seq2', 80)
;
select
sequence,
group_concat(factor order by factor) as factors,
group_concat(num order by factor) as nums
from
clusters
group by
sequence
having
factors like '%A%B%C%'
or
factors like '%B%D%E%'
Output:
+----------+---------+-------------+
| sequence | factors | nums |
+----------+---------+-------------+
| seq1 | A,B,C | 20,40,25 |
| seq2 | B,B,D,E | 80,25,30,45 |
+----------+---------+-------------+
I guess there is a better way but I prefer not to mess with SQL too much.
As a bonus, some grouping and printing routines. Features flexible search pattern (not limited to 3) and other nice things.
use strict;
use warnings;
use Carp;
use List::Util 'max';
my @factors = ('B,B,D,D,E,E' x 10) x 10_000;
my @numbers = ('80,25,30,1000,45,0.5' x 10) x 10_000;
my $search = 'BDE';
for my $i ( 0 .. $#factors ) {
print_groups(
group( $factors[$i], $numbers[$i], $search )
);
}
sub group {
my ( $factors, $numbers, $search ) = @_;
my @factors = split ',', $factors;
my @numbers = split ',', $numbers;
croak "Fatal error!"
unless @factors == @numbers;
my @groups;
for my $s (split '', $search) {
my @temp = ($s);
for my $i ( 0 .. $#factors ) {
push @temp, $numbers[$i] if $factors[$i] eq $s;
}
push @groups, \@temp;
}
return \@groups;
}
sub print_groups {
my $groups = shift;
my $current = shift || 0; # or defined-or
my $prev = shift;
if ( not defined $groups->[$current] ) {
print $prev , "\n";
return;
}
my ( $factor, @numbers ) = @{ $groups->[$current] };
my $max = max map length, @numbers;
for my $number (@numbers) {
print_groups(
$groups,
$current + 1,
( $prev ? "$prev " : '' )
. ( sprintf '%s = %-*s', $factor, $max, $number )
);
}
}
Output:
B = 80 D = 30 E = 45
B = 80 D = 30 E = 0.5
B = 80 D = 1000 E = 45
B = 80 D = 1000 E = 0.5
B = 25 D = 30 E = 45
B = 25 D = 30 E = 0.5
B = 25 D = 1000 E = 45
B = 25 D = 1000 E = 0.5
... (24_200_000 rows)
That takes about a minute on my pretty underpowered laptop, and I would expect the database to be reasonably fast too. |