Regular Expressions
Time limit 10 seconds
Memory limit 256Mb
Input standard input or input.txt
Output standard output or output.txt
Pavel doesn’t have a college education. He’s a natural programmer. He loves brief, laconic bits of code more than anything else in the world. One-liners are his favorite! Striving for shorter code he often makes use of regular expressions. There is nothing more creative and enjoyable than working on yet another regular expression in the middle of the night.
But bummer. Pavel’s colleague, Simon, is often unhappy with Pavel’s code. Simon never uses regular expressions in his code. He’s just jealous! If he knew a thing about regular expressions, he wouldn’t complain.
Once Pavel was careless to remark that Simon is not creative enough to build complex regular expressions. Simon replied, quite sacrilegiously, that there was nothing creative about regular expressions. Their argument evolved into a competition: they decided to find out who would be the fastest to write a regular expression matching a defined set of strings. Simon wants to win by creating a program which would solve this problem automatically, thus proving the lack of creative component in the process. Their friend Peter, a bioinformatician, will be the judge.
Simon glanced at the syntaxis of Perl regular expressions and was horrified: is this what programmers have done to a simple and elegant concept? Simon graduated from the mathematics department of the university and has always thought that a regular expression is the following:
0 (zero character) — a regular expression that matches no string.
a, g, t, c (letter) — a regular expression matching precisely one string: a single-letter string with the same letter as that in the expression.
If R and P are regular expressions, then (R|P) is also a regular expression. It matches all strings which match at least one of the expressions R and P.
If R and P are regular expressions, then (RP) is also a regular expression. It matches all strings which can be split in two in such a way that the first part matches the expression R, and the second part matches the expression P.
If R is a regular expression, then (R*) is also a regular expression. It matches all strings which can be split into k >= 0 parts so that each part matches the expression R.
For instance, the regular expression (a*) matches any string containing only letters a, including an empty string. The regular expression (0*) only matches an empty string (which can be split into zero parts matching the expression 0). The regular expression (a|(g(tc))) matches two strings: a and gtc. Note that it is forbidden to omit or add extra brackets to regular expressions: it must contain strictly those pairs of brackets which appear during its construction according to the rules described above.
Simon wants a flawless victory by building the shortest matching regular expression. Help Simon. Write a program which finds the shortest regular expression for a set of strings, such that all these strings match the expression.
####
The first line contains an integer T — the number of tests. It is followed by test descriptions.
The first line of a test description contains a single positive integer N — the number of strings in the set. Each of the following N lines begins with a nonnegative integer L — the length of the string from the set, followed by the string itself, separated by a space. The string only contains lowercase latin letters a, g, t, c. Note that a string in the set can be empty. In this case the line in the file will only contain the digit 0 and a space symbol.
The total number of strings in all sets is not greater than 2000. The sum of lengths of all strings in all sets is not greater than 2000.
####
Print precisely T regular expressions, one per line. Each regular expression must be an answer to a corresponding test. If for any test there is no regular expression matched by all strings from the set, print the word Impossible instead of the answer.
####
3
2
1 a
3 gtc
1
0
3
1 g
2 gg
3 ggg
####
(a|(g(tc)))
(0*)
(g*)
####
The sixth line of the input data in the example zero is followed by a space symbol.
####
#!/usr/bin/perl
use warnings;
use strict;
use re 'eval';
use Time::HiRes qw( gettimeofday tv_interval );
my $t0 = [ gettimeofday ];
$\ = $/;
my $debug = 0;
my $debug2 = 0;
my $time = 0;
my @acgt = qw( a c g t );
my @combinations = map { [ combinations( $_, $_ - 1, @acgt ) ] } 1 .. 5;
$debug and print ~~ @{ $_ } for @combinations;
my %regexes = ( '_' => 1 );
while( 1 ){
my @old_regexes = sort { length $a <=> length $b || $a cmp $b } keys %regexes;
$debug and print " " . @old_regexes;
for my $i ( 0 .. @old_regexes - 1 ){
my $copy_of_index_i = $i;
$i = $old_regexes[ $i ];
if( $i !~ /\*/ ){
$regexes{ "($i*)" } ++ if ( length $i ) + 3 < 16;
for my $j ( $copy_of_index_i + 0 .. @old_regexes - 1 ){
$j = $old_regexes[ $j ];
last if ( length $i ) + ( length $j ) + 3 > 15;
next if $j =~ /\*/;
$regexes{ "($i|$j)" } ++;
# $regexes{ "($j|$i)" } ++;
}
}
for my $j ( $copy_of_index_i + 0 .. @old_regexes - 1 ){
$j = $old_regexes[ $j ];
last if ( length $i ) + ( length $j ) + 2 > 15;
next if $i =~ /\*/ && $j =~ /\*/;
$regexes{ "($i$j)" } ++;
$regexes{ "($j$i)" } ++;
}
}
last if @old_regexes == keys %regexes;
}
$debug2 and print "regexes(30): ", join ' ', ( sort { length $a <=> length $b } keys %regexes )[ 0 .. 30 ];
$debug and print "number of abstract regexes: " . keys %regexes;
my %uniq;
map $uniq{ $_ } ++, keys %regexes;
%regexes = map { $_ => 1 } keys %uniq;
$debug and print ~~ keys %regexes;
for my $re ( sort keys %regexes ){
$re =~ / (\w) [(] (\w)(\w) [)] /x or next;
delete $regexes{ $re };
$re =~ s/ (\w) [(] (\w)(\w) [)] /($1$2)$3/x;
$regexes{ $re } ++;
}
$debug and print ~~ keys %regexes;
for my $re ( sort keys %regexes ){
$re =~ / (\w) [(] [(] (\w)(\w) [)] (\w) [)] /x or next;
delete $regexes{ $re };
$re =~ s/ (\w) [(] [(] (\w)(\w) [)] (\w) [)] /(($1$2)$3)$4/x;
$regexes{ $re } ++;
}
$debug and print ~~ keys %regexes;
for my $re ( sort keys %regexes ){
$re =~ / [(] (\w) (\w) [)] [(] (\w) (\w) [)] /x or next;
delete $regexes{ $re };
$re =~ s/ [(] (\w) (\w) [)] [(] (\w) (\w) [)] /(($1$2)$3)$4/x;
$regexes{ $re } ++;
}
$debug and print ~~ keys %regexes;
$regexes{ "((((a|c)|g)|t)*)" } ++;
$debug and print ~~ keys %regexes;
my @regexes = sort { length $a <=> length $b } keys %regexes;
y/_/./ for @regexes;
$debug2 and print "regexes(30): @regexes[ 0 .. 30 ]";
%regexes = map { $_ => 1 } @regexes;
my %HoR;
for my $re ( keys %regexes ){
my $cnt = () = $re =~ /\./g;
for my $comb ( @{ $combinations[ $cnt - 1 ] } ){
$_ = $re;
for my $letter ( split //, $comb ){
s/\./$letter/;
}
next if /\*/ and 4 == $cnt and /(\w).*\1/;
next if / [(] (\w) \| (\w) [)] /x and $1 ge $2;
next if / [(] [(] (\w) \| (\w) [)] \| (\w) [)] /x and $1 ge $2 || $1 ge $3 || $2 ge $3;
push @{ $HoR{ $re } }, qr/^$_$/;
}
}
if( $debug ){
my @c = ( 0 ) x 5;
my $c = 0;
for my $re ( keys %HoR ){
my $cnt = () = $re =~ /\./g;
$c[ $cnt - 1 ] += @{ $HoR{ $re } };
$c += @{ $HoR{ $re } };
}
print "all regexes: " . $c;
print " $_" for @c;
}
$time and print ' ', tv_interval( $t0 );
################ MAIN:
<>;
while( <> ){
@_ = map ~~<>, 1 .. $_;
chomp @_;
s/\S+ // for @_;
$debug2 and print "\@_:@_";
s/(.)\1{2,}/ $1 x 3 /ge for @_;
s/(.{2,4})\1{2,}/ $1 x 2 /ge for @_;
$time and print ' ', tv_interval( $t0 );
my %uniq;
map $uniq{ $_ } ++, @_;
@_ = sort keys %uniq;
my @regexes = sort { length $a <=> length $b } keys %HoR;
for my $str ( @_ ){
$debug and print " candidate abstract regexes: " . @regexes;
$debug2 and print " str:[$str]";
my @ok;
for my $re ( @regexes ){
$str =~ /^$re$/ and push @ok, $re;
}
@regexes = @ok;
}
$debug and print " abstract regexes which match: " . @regexes;
$debug2 and print " abstract regexes: @regexes";
my @real;
@regexes = map @{ $HoR{ $_ } }, @regexes;
for my $str ( @_ ){
$debug and print " candidate regexes: " . @regexes;
$debug2 and print " str:[$str]";
my @ok;
for my $re ( @regexes ){
$str =~ $re and push @ok, $re; # HOT SPOT
}
@regexes = @ok;
}
$debug and print " regexes which match: " . @regexes;
my $shortest = shift @regexes;
my $strip = join "(.*)", map quotemeta, qw[ (?^:^ $) ];
map { s/^ $strip $/$1/x } $shortest;
#^ map { s/^ \Q(?^:^\E //x, s/ \Q$)\E $//x } $shortest; # alternative: OK
print $shortest ? $shortest : "Impossible";
$time and print tv_interval( $t0 );
$debug and print '-' x 20;
}
########## END MAIN
sub combinations {
my( $length, $min_set, @letters ) = @_;
my $str = join '-', ( join '', @letters ) x $length;
$debug2 and print $str;
my $re = join join( '-', ( '.*?' ) x 2 ), ('(.)') x $length;
$debug2 and print $re;
my %combs;
$str =~ /$re (?{ $combs{ join '', grep defined, $1, $2, $3, $4, $5 } ++ }) (*FAIL)/x;
my @combs = keys %combs;
$debug2 and print "all combs: " . @combs;
# @combs = grep { !/(.)\1/ } @combs;
# $debug2 and print "all combs: " . @combs;
@combs = grep {
my %uniq;
map $uniq{ $_ } ++, split //;
$min_set <= keys %uniq;
} @combs;
$debug2 and print "unique combs: " . @combs;
$debug2 and print "@combs";
return @combs;
}
####
*
2
1 a
3 gtc
1
0
3
1 g
2 gg
3 ggg
2
- tcac
- gcac
2
- tac
- gtc
2
- gag
- tcag
2
- ga
- ctca
2
- a
- acaca
2
- aaa
- cccc
3
- aaa
- ccc
- ttt
4
- a
- c
- g
- t
4
- aaaaa
- c
- g
- t
3
- acac
- caca
- g
1
- g
1
- gcgc
1
- gcgcg
1
- acacacacacacacacacac
1
- aaaaaaaaaaaaaaaaaaaaaaaaaac
1
- aaaaaaaaaacg
1
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
1
- acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
1
- acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacg
1
- acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacgt
9
- a
- aaaaaaaaaaaaaaaaaaaaaaa
- aa
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacccc
- aac
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
- ac
- aaaaaac
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
32
- a
- c
- t
- g
- aa
- cc
- tt
- gg
- a
- c
- t
- g
- aa
- cc
- tt
- gg
- a
- c
- t
- g
- aa
- cc
- tt
- gg
- a
- c
- t
- g
- aa
- cc
- tt
- gg
2
- acgtgca
-
2
- acgt
-
1
- aaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccct
1
- acggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgat
1
- acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacg
1
- acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgt
2
-
-
2
-
- c
3
-
- c
- t
3
-
- ccc
- t
2
- acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgt
- acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacga
2
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacgcgcgcgcgcgcgcgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacgcgcgcgcgcgcgcgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaat
8
- aaaacgcgaaaa
- aaaacgcgaaaat
- aaaacgcgaaaatttttt
- aaaacgcgaaaatttt
- aaaacgcgcgaaaatttttt
- aaaaaaacgcgaaaatttttt
- cgcgaaaatttttt
- aaaacgcgaaaattttttcgcg
1
- acg
####
(a|((gt)c))
(c*)
(g*)
((((g|t)c)a)c)
(((gt)|(ta))c)
((g|(tc))(ag))
((g|((ct)c))a)
((a|c)*)
((a|c)*)
((t|(a|c))*)
((a|t)|(g|c))
((((a|c)|g)|t)*)
((a|(c|g))*)
g
((gc)*)
((c|g)*)
((ac)*)
((a*)c)
((a*)(cg))
((a*)c)
((ac)*)
(((ac)*)g)
((((ac)*)g)t)
((a|c)*)
((((a|c)|g)|t)*)
((((a|c)|g)|t)*)
((((ac)g)t)*)
((t|(a|c))*)
(((c|(a|g))*)t)
(((ac)g)*)
((((ac)g)*)t)
(c*)
(c*)
((c|t)*)
((c|t)*)
(((cg)|(a|t))*)
(((cg)|(a|t))*)
(((cg)|(a|t))*)
((ac)g)
real 0m0.616s
user 0m0.584s
sys 0m0.028s
####
4
16
60
168
240
1
4
25
144
235
number of abstract regexes: 235
235
208
203
198
199
all regexes: 15264
8
76
960
3384
10836
candidate abstract regexes: 199
candidate abstract regexes: 60
abstract regexes which match: 45
candidate regexes: 2550
candidate regexes: 954
regexes which match: 292
(a|((gt)c))
--------------------
candidate abstract regexes: 199
abstract regexes which match: 27
candidate regexes: 746
regexes which match: 746
(c*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 60
candidate abstract regexes: 35
abstract regexes which match: 28
candidate regexes: 1194
candidate regexes: 528
candidate regexes: 318
regexes which match: 318
(g*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 114
abstract regexes which match: 114
candidate regexes: 6382
candidate regexes: 284
regexes which match: 249
((((g|t)c)a)c)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 136
abstract regexes which match: 136
candidate regexes: 9366
candidate regexes: 669
regexes which match: 249
(((gt)|(ta))c)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 136
abstract regexes which match: 76
candidate regexes: 3210
candidate regexes: 277
regexes which match: 245
((g|(tc))(ag))
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 114
abstract regexes which match: 67
candidate regexes: 2086
candidate regexes: 258
regexes which match: 246
((g|((ct)c))a)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 60
abstract regexes which match: 34
candidate regexes: 906
candidate regexes: 462
regexes which match: 261
((a|c)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 136
abstract regexes which match: 136
candidate regexes: 9366
candidate regexes: 318
regexes which match: 255
((a|c)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 136
candidate abstract regexes: 136
abstract regexes which match: 136
candidate regexes: 9366
candidate regexes: 318
candidate regexes: 255
regexes which match: 243
((t|(a|c))*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 60
candidate abstract regexes: 60
candidate abstract regexes: 60
abstract regexes which match: 60
candidate regexes: 3940
candidate regexes: 1444
candidate regexes: 466
candidate regexes: 306
regexes which match: 264
((a|t)|(g|c))
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 136
candidate abstract regexes: 45
candidate abstract regexes: 45
abstract regexes which match: 45
candidate regexes: 2550
candidate regexes: 318
candidate regexes: 257
candidate regexes: 243
regexes which match: 240
((((a|c)|g)|t)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 114
candidate abstract regexes: 114
abstract regexes which match: 29
candidate regexes: 1002
candidate regexes: 268
candidate regexes: 255
regexes which match: 243
((a|(c|g))*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 60
candidate regexes: 3940
regexes which match: 1444
g
--------------------
candidate abstract regexes: 199
abstract regexes which match: 114
candidate regexes: 6382
regexes which match: 293
((gc)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 82
candidate regexes: 2922
regexes which match: 261
((c|g)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 84
candidate regexes: 2062
regexes which match: 293
((ac)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 114
candidate regexes: 6382
regexes which match: 283
((a*)c)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 82
candidate regexes: 2922
regexes which match: 264
((a*)(cg))
--------------------
candidate abstract regexes: 199
abstract regexes which match: 114
candidate regexes: 6382
regexes which match: 283
((a*)c)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 85
candidate regexes: 2086
regexes which match: 293
((ac)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 78
candidate regexes: 1962
regexes which match: 260
(((ac)*)g)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 91
candidate regexes: 2194
regexes which match: 258
((((ac)*)g)t)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 60
candidate abstract regexes: 35
candidate abstract regexes: 28
candidate abstract regexes: 26
candidate abstract regexes: 26
candidate abstract regexes: 26
abstract regexes which match: 26
candidate regexes: 714
candidate regexes: 408
candidate regexes: 318
candidate regexes: 318
candidate regexes: 258
candidate regexes: 256
candidate regexes: 256
regexes which match: 256
((a|c)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 60
candidate abstract regexes: 35
candidate abstract regexes: 35
candidate abstract regexes: 35
candidate abstract regexes: 35
candidate abstract regexes: 35
candidate abstract regexes: 35
abstract regexes which match: 35
candidate regexes: 2118
candidate regexes: 813
candidate regexes: 318
candidate regexes: 257
candidate regexes: 255
candidate regexes: 243
candidate regexes: 243
candidate regexes: 240
regexes which match: 240
((((a|c)|g)|t)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 27
abstract regexes which match: 12
candidate regexes: 466
candidate regexes: 466
regexes which match: 240
((((a|c)|g)|t)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 27
abstract regexes which match: 20
candidate regexes: 614
candidate regexes: 614
regexes which match: 256
((((ac)g)t)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 86
candidate regexes: 2110
regexes which match: 243
((t|(a|c))*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 80
candidate regexes: 2010
regexes which match: 243
(((c|(a|g))*)t)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 90
candidate regexes: 2170
regexes which match: 265
(((ac)g)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 80
candidate regexes: 2010
regexes which match: 253
((((ac)g)*)t)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 27
candidate regexes: 746
regexes which match: 746
(c*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 27
abstract regexes which match: 10
candidate regexes: 418
candidate regexes: 418
regexes which match: 304
(c*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 27
candidate abstract regexes: 10
abstract regexes which match: 10
candidate regexes: 418
candidate regexes: 418
candidate regexes: 304
regexes which match: 255
((c|t)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 27
candidate abstract regexes: 19
abstract regexes which match: 10
candidate regexes: 418
candidate regexes: 418
candidate regexes: 304
regexes which match: 255
((c|t)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 80
abstract regexes which match: 80
candidate regexes: 2010
candidate regexes: 249
regexes which match: 243
(((cg)|(a|t))*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 83
abstract regexes which match: 70
candidate regexes: 1770
candidate regexes: 249
regexes which match: 243
(((cg)|(a|t))*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 85
candidate abstract regexes: 70
candidate abstract regexes: 70
candidate abstract regexes: 70
abstract regexes which match: 70
candidate regexes: 1770
candidate regexes: 249
candidate regexes: 243
candidate regexes: 243
candidate regexes: 243
regexes which match: 243
(((cg)|(a|t))*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 136
candidate regexes: 9366
regexes which match: 669
((ac)g)
--------------------
real 0m0.678s
user 0m0.668s
sys 0m0.004s