So I was thinking how I would do that and came up with this (MySQL):
drop table if exists clusters; create table clusters ( factor char(1) not null, sequence varchar(10) not null, num integer unsigned not null, index (sequence) ); insert into clusters values ('C', 'seq1', 25), ('A', 'seq1', 20), ('E', 'seq2', 45), ('D', 'seq2', 30), ('B', 'seq2', 25), ('B', 'seq1', 40), ('B', 'seq2', 80) ; select sequence, group_concat(factor order by factor) as factors, group_concat(num order by factor) as nums from clusters group by sequence having factors like '%A%B%C%' or factors like '%B%D%E%'
Output:
+----------+---------+-------------+ | sequence | factors | nums | +----------+---------+-------------+ | seq1 | A,B,C | 20,40,25 | | seq2 | B,B,D,E | 80,25,30,45 | +----------+---------+-------------+
I guess there is a better way but I prefer not to mess with SQL too much.

As a bonus, some grouping and printing routines. Features flexible search pattern (not limited to 3) and other nice things.

use strict; use warnings; use Carp; use List::Util 'max'; my @factors = ('B,B,D,D,E,E' x 10) x 10_000; my @numbers = ('80,25,30,1000,45,0.5' x 10) x 10_000; my $search = 'BDE'; for my $i ( 0 .. $#factors ) { print_groups( group( $factors[$i], $numbers[$i], $search ) ); } sub group { my ( $factors, $numbers, $search ) = @_; my @factors = split ',', $factors; my @numbers = split ',', $numbers; croak "Fatal error!" unless @factors == @numbers; my @groups; for my $s (split '', $search) { my @temp = ($s); for my $i ( 0 .. $#factors ) { push @temp, $numbers[$i] if $factors[$i] eq $s; } push @groups, \@temp; } return \@groups; } sub print_groups { my $groups = shift; my $current = shift || 0; # or defined-or my $prev = shift; if ( not defined $groups->[$current] ) { print $prev , "\n"; return; } my ( $factor, @numbers ) = @{ $groups->[$current] }; my $max = max map length, @numbers; for my $number (@numbers) { print_groups( $groups, $current + 1, ( $prev ? "$prev " : '' ) . ( sprintf '%s = %-*s', $factor, $max, $number ) ); } }
Output:
B = 80 D = 30 E = 45 B = 80 D = 30 E = 0.5 B = 80 D = 1000 E = 45 B = 80 D = 1000 E = 0.5 B = 25 D = 30 E = 45 B = 25 D = 30 E = 0.5 B = 25 D = 1000 E = 45 B = 25 D = 1000 E = 0.5 ... (24_200_000 rows)
That takes about a minute on my pretty underpowered laptop, and I would expect the database to be reasonably fast too.

In reply to Re^2: how to speed up comparison between two files by Anonymous Monk
in thread how to speed up comparison between two files by greeknlatin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.