Regular Expressions Time limit 10 seconds Memory limit 256Mb Input standard input or input.txt Output standard output or output.txt Pavel doesn’t have a college education. He’s a natural programmer. He loves brief, laconic bits of code more than anything else in the world. One-liners are his favorite! Striving for shorter code he often makes use of regular expressions. There is nothing more creative and enjoyable than working on yet another regular expression in the middle of the night. But bummer. Pavel’s colleague, Simon, is often unhappy with Pavel’s code. Simon never uses regular expressions in his code. He’s just jealous! If he knew a thing about regular expressions, he wouldn’t complain. Once Pavel was careless to remark that Simon is not creative enough to build complex regular expressions. Simon replied, quite sacrilegiously, that there was nothing creative about regular expressions. Their argument evolved into a competition: they decided to find out who would be the fastest to write a regular expression matching a defined set of strings. Simon wants to win by creating a program which would solve this problem automatically, thus proving the lack of creative component in the process. Their friend Peter, a bioinformatician, will be the judge. Simon glanced at the syntaxis of Perl regular expressions and was horrified: is this what programmers have done to a simple and elegant concept? Simon graduated from the mathematics department of the university and has always thought that a regular expression is the following: 0 (zero character) — a regular expression that matches no string. a, g, t, c (letter) — a regular expression matching precisely one string: a single-letter string with the same letter as that in the expression. If R and P are regular expressions, then (R|P) is also a regular expression. It matches all strings which match at least one of the expressions R and P. If R and P are regular expressions, then (RP) is also a regular expression. It matches all strings which can be split in two in such a way that the first part matches the expression R, and the second part matches the expression P. If R is a regular expression, then (R*) is also a regular expression. It matches all strings which can be split into k >= 0 parts so that each part matches the expression R. For instance, the regular expression (a*) matches any string containing only letters a, including an empty string. The regular expression (0*) only matches an empty string (which can be split into zero parts matching the expression 0). The regular expression (a|(g(tc))) matches two strings: a and gtc. Note that it is forbidden to omit or add extra brackets to regular expressions: it must contain strictly those pairs of brackets which appear during its construction according to the rules described above. Simon wants a flawless victory by building the shortest matching regular expression. Help Simon. Write a program which finds the shortest regular expression for a set of strings, such that all these strings match the expression. #### The first line contains an integer T — the number of tests. It is followed by test descriptions. The first line of a test description contains a single positive integer N — the number of strings in the set. Each of the following N lines begins with a nonnegative integer L — the length of the string from the set, followed by the string itself, separated by a space. The string only contains lowercase latin letters a, g, t, c. Note that a string in the set can be empty. In this case the line in the file will only contain the digit 0 and a space symbol. The total number of strings in all sets is not greater than 2000. The sum of lengths of all strings in all sets is not greater than 2000. #### Print precisely T regular expressions, one per line. Each regular expression must be an answer to a corresponding test. If for any test there is no regular expression matched by all strings from the set, print the word Impossible instead of the answer. #### 3 2 1 a 3 gtc 1 0 3 1 g 2 gg 3 ggg #### (a|(g(tc))) (0*) (g*) #### The sixth line of the input data in the example zero is followed by a space symbol. #### #!/usr/bin/perl use warnings; use strict; use re 'eval'; use Time::HiRes qw( gettimeofday tv_interval ); my $t0 = [ gettimeofday ]; $\ = $/; my $debug = 0; my $debug2 = 0; my $time = 0; my @acgt = qw( a c g t ); my @combinations = map { [ combinations( $_, $_ - 1, @acgt ) ] } 1 .. 5; $debug and print ~~ @{ $_ } for @combinations; my %regexes = ( '_' => 1 ); while( 1 ){ my @old_regexes = sort { length $a <=> length $b || $a cmp $b } keys %regexes; $debug and print " " . @old_regexes; for my $i ( 0 .. @old_regexes - 1 ){ my $copy_of_index_i = $i; $i = $old_regexes[ $i ]; if( $i !~ /\*/ ){ $regexes{ "($i*)" } ++ if ( length $i ) + 3 < 16; for my $j ( $copy_of_index_i + 0 .. @old_regexes - 1 ){ $j = $old_regexes[ $j ]; last if ( length $i ) + ( length $j ) + 3 > 15; next if $j =~ /\*/; $regexes{ "($i|$j)" } ++; # $regexes{ "($j|$i)" } ++; } } for my $j ( $copy_of_index_i + 0 .. @old_regexes - 1 ){ $j = $old_regexes[ $j ]; last if ( length $i ) + ( length $j ) + 2 > 15; next if $i =~ /\*/ && $j =~ /\*/; $regexes{ "($i$j)" } ++; $regexes{ "($j$i)" } ++; } } last if @old_regexes == keys %regexes; } $debug2 and print "regexes(30): ", join ' ', ( sort { length $a <=> length $b } keys %regexes )[ 0 .. 30 ]; $debug and print "number of abstract regexes: " . keys %regexes; my %uniq; map $uniq{ $_ } ++, keys %regexes; %regexes = map { $_ => 1 } keys %uniq; $debug and print ~~ keys %regexes; for my $re ( sort keys %regexes ){ $re =~ / (\w) [(] (\w)(\w) [)] /x or next; delete $regexes{ $re }; $re =~ s/ (\w) [(] (\w)(\w) [)] /($1$2)$3/x; $regexes{ $re } ++; } $debug and print ~~ keys %regexes; for my $re ( sort keys %regexes ){ $re =~ / (\w) [(] [(] (\w)(\w) [)] (\w) [)] /x or next; delete $regexes{ $re }; $re =~ s/ (\w) [(] [(] (\w)(\w) [)] (\w) [)] /(($1$2)$3)$4/x; $regexes{ $re } ++; } $debug and print ~~ keys %regexes; for my $re ( sort keys %regexes ){ $re =~ / [(] (\w) (\w) [)] [(] (\w) (\w) [)] /x or next; delete $regexes{ $re }; $re =~ s/ [(] (\w) (\w) [)] [(] (\w) (\w) [)] /(($1$2)$3)$4/x; $regexes{ $re } ++; } $debug and print ~~ keys %regexes; $regexes{ "((((a|c)|g)|t)*)" } ++; $debug and print ~~ keys %regexes; my @regexes = sort { length $a <=> length $b } keys %regexes; y/_/./ for @regexes; $debug2 and print "regexes(30): @regexes[ 0 .. 30 ]"; %regexes = map { $_ => 1 } @regexes; my %HoR; for my $re ( keys %regexes ){ my $cnt = () = $re =~ /\./g; for my $comb ( @{ $combinations[ $cnt - 1 ] } ){ $_ = $re; for my $letter ( split //, $comb ){ s/\./$letter/; } next if /\*/ and 4 == $cnt and /(\w).*\1/; next if / [(] (\w) \| (\w) [)] /x and $1 ge $2; next if / [(] [(] (\w) \| (\w) [)] \| (\w) [)] /x and $1 ge $2 || $1 ge $3 || $2 ge $3; push @{ $HoR{ $re } }, qr/^$_$/; } } if( $debug ){ my @c = ( 0 ) x 5; my $c = 0; for my $re ( keys %HoR ){ my $cnt = () = $re =~ /\./g; $c[ $cnt - 1 ] += @{ $HoR{ $re } }; $c += @{ $HoR{ $re } }; } print "all regexes: " . $c; print " $_" for @c; } $time and print ' ', tv_interval( $t0 ); ################ MAIN: <>; while( <> ){ @_ = map ~~<>, 1 .. $_; chomp @_; s/\S+ // for @_; $debug2 and print "\@_:@_"; s/(.)\1{2,}/ $1 x 3 /ge for @_; s/(.{2,4})\1{2,}/ $1 x 2 /ge for @_; $time and print ' ', tv_interval( $t0 ); my %uniq; map $uniq{ $_ } ++, @_; @_ = sort keys %uniq; my @regexes = sort { length $a <=> length $b } keys %HoR; for my $str ( @_ ){ $debug and print " candidate abstract regexes: " . @regexes; $debug2 and print " str:[$str]"; my @ok; for my $re ( @regexes ){ $str =~ /^$re$/ and push @ok, $re; } @regexes = @ok; } $debug and print " abstract regexes which match: " . @regexes; $debug2 and print " abstract regexes: @regexes"; my @real; @regexes = map @{ $HoR{ $_ } }, @regexes; for my $str ( @_ ){ $debug and print " candidate regexes: " . @regexes; $debug2 and print " str:[$str]"; my @ok; for my $re ( @regexes ){ $str =~ $re and push @ok, $re; # HOT SPOT } @regexes = @ok; } $debug and print " regexes which match: " . @regexes; my $shortest = shift @regexes; my $strip = join "(.*)", map quotemeta, qw[ (?^:^ $) ]; map { s/^ $strip $/$1/x } $shortest; #^ map { s/^ \Q(?^:^\E //x, s/ \Q$)\E $//x } $shortest; # alternative: OK print $shortest ? $shortest : "Impossible"; $time and print tv_interval( $t0 ); $debug and print '-' x 20; } ########## END MAIN sub combinations { my( $length, $min_set, @letters ) = @_; my $str = join '-', ( join '', @letters ) x $length; $debug2 and print $str; my $re = join join( '-', ( '.*?' ) x 2 ), ('(.)') x $length; $debug2 and print $re; my %combs; $str =~ /$re (?{ $combs{ join '', grep defined, $1, $2, $3, $4, $5 } ++ }) (*FAIL)/x; my @combs = keys %combs; $debug2 and print "all combs: " . @combs; # @combs = grep { !/(.)\1/ } @combs; # $debug2 and print "all combs: " . @combs; @combs = grep { my %uniq; map $uniq{ $_ } ++, split //; $min_set <= keys %uniq; } @combs; $debug2 and print "unique combs: " . @combs; $debug2 and print "@combs"; return @combs; } #### * 2 1 a 3 gtc 1 0 3 1 g 2 gg 3 ggg 2 - tcac - gcac 2 - tac - gtc 2 - gag - tcag 2 - ga - ctca 2 - a - acaca 2 - aaa - cccc 3 - aaa - ccc - ttt 4 - a - c - g - t 4 - aaaaa - c - g - t 3 - acac - caca - g 1 - g 1 - gcgc 1 - gcgcg 1 - acacacacacacacacacac 1 - aaaaaaaaaaaaaaaaaaaaaaaaaac 1 - aaaaaaaaaacg 1 - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac 1 - acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac 1 - acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacg 1 - acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacgt 9 - a - aaaaaaaaaaaaaaaaaaaaaaa - aa - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacccc - aac - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac - ac - aaaaaac - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac 32 - a - c - t - g - aa - cc - tt - gg - a - c - t - g - aa - cc - tt - gg - a - c - t - g - aa - cc - tt - gg - a - c - t - g - aa - cc - tt - gg 2 - acgtgca - 2 - acgt - 1 - aaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccct 1 - acggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgat 1 - acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacg 1 - acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgt 2 - - 2 - - c 3 - - c - t 3 - - ccc - t 2 - acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgt - acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacga 2 - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacgcgcgcgcgcgcgcgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacgcgcgcgcgcgcgcgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaat 8 - aaaacgcgaaaa - aaaacgcgaaaat - aaaacgcgaaaatttttt - aaaacgcgaaaatttt - aaaacgcgcgaaaatttttt - aaaaaaacgcgaaaatttttt - cgcgaaaatttttt - aaaacgcgaaaattttttcgcg 1 - acg #### (a|((gt)c)) (c*) (g*) ((((g|t)c)a)c) (((gt)|(ta))c) ((g|(tc))(ag)) ((g|((ct)c))a) ((a|c)*) ((a|c)*) ((t|(a|c))*) ((a|t)|(g|c)) ((((a|c)|g)|t)*) ((a|(c|g))*) g ((gc)*) ((c|g)*) ((ac)*) ((a*)c) ((a*)(cg)) ((a*)c) ((ac)*) (((ac)*)g) ((((ac)*)g)t) ((a|c)*) ((((a|c)|g)|t)*) ((((a|c)|g)|t)*) ((((ac)g)t)*) ((t|(a|c))*) (((c|(a|g))*)t) (((ac)g)*) ((((ac)g)*)t) (c*) (c*) ((c|t)*) ((c|t)*) (((cg)|(a|t))*) (((cg)|(a|t))*) (((cg)|(a|t))*) ((ac)g) real 0m0.616s user 0m0.584s sys 0m0.028s #### 4 16 60 168 240 1 4 25 144 235 number of abstract regexes: 235 235 208 203 198 199 all regexes: 15264 8 76 960 3384 10836 candidate abstract regexes: 199 candidate abstract regexes: 60 abstract regexes which match: 45 candidate regexes: 2550 candidate regexes: 954 regexes which match: 292 (a|((gt)c)) -------------------- candidate abstract regexes: 199 abstract regexes which match: 27 candidate regexes: 746 regexes which match: 746 (c*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 60 candidate abstract regexes: 35 abstract regexes which match: 28 candidate regexes: 1194 candidate regexes: 528 candidate regexes: 318 regexes which match: 318 (g*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 114 abstract regexes which match: 114 candidate regexes: 6382 candidate regexes: 284 regexes which match: 249 ((((g|t)c)a)c) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 136 abstract regexes which match: 136 candidate regexes: 9366 candidate regexes: 669 regexes which match: 249 (((gt)|(ta))c) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 136 abstract regexes which match: 76 candidate regexes: 3210 candidate regexes: 277 regexes which match: 245 ((g|(tc))(ag)) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 114 abstract regexes which match: 67 candidate regexes: 2086 candidate regexes: 258 regexes which match: 246 ((g|((ct)c))a) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 60 abstract regexes which match: 34 candidate regexes: 906 candidate regexes: 462 regexes which match: 261 ((a|c)*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 136 abstract regexes which match: 136 candidate regexes: 9366 candidate regexes: 318 regexes which match: 255 ((a|c)*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 136 candidate abstract regexes: 136 abstract regexes which match: 136 candidate regexes: 9366 candidate regexes: 318 candidate regexes: 255 regexes which match: 243 ((t|(a|c))*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 60 candidate abstract regexes: 60 candidate abstract regexes: 60 abstract regexes which match: 60 candidate regexes: 3940 candidate regexes: 1444 candidate regexes: 466 candidate regexes: 306 regexes which match: 264 ((a|t)|(g|c)) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 136 candidate abstract regexes: 45 candidate abstract regexes: 45 abstract regexes which match: 45 candidate regexes: 2550 candidate regexes: 318 candidate regexes: 257 candidate regexes: 243 regexes which match: 240 ((((a|c)|g)|t)*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 114 candidate abstract regexes: 114 abstract regexes which match: 29 candidate regexes: 1002 candidate regexes: 268 candidate regexes: 255 regexes which match: 243 ((a|(c|g))*) -------------------- candidate abstract regexes: 199 abstract regexes which match: 60 candidate regexes: 3940 regexes which match: 1444 g -------------------- candidate abstract regexes: 199 abstract regexes which match: 114 candidate regexes: 6382 regexes which match: 293 ((gc)*) -------------------- candidate abstract regexes: 199 abstract regexes which match: 82 candidate regexes: 2922 regexes which match: 261 ((c|g)*) -------------------- candidate abstract regexes: 199 abstract regexes which match: 84 candidate regexes: 2062 regexes which match: 293 ((ac)*) -------------------- candidate abstract regexes: 199 abstract regexes which match: 114 candidate regexes: 6382 regexes which match: 283 ((a*)c) -------------------- candidate abstract regexes: 199 abstract regexes which match: 82 candidate regexes: 2922 regexes which match: 264 ((a*)(cg)) -------------------- candidate abstract regexes: 199 abstract regexes which match: 114 candidate regexes: 6382 regexes which match: 283 ((a*)c) -------------------- candidate abstract regexes: 199 abstract regexes which match: 85 candidate regexes: 2086 regexes which match: 293 ((ac)*) -------------------- candidate abstract regexes: 199 abstract regexes which match: 78 candidate regexes: 1962 regexes which match: 260 (((ac)*)g) -------------------- candidate abstract regexes: 199 abstract regexes which match: 91 candidate regexes: 2194 regexes which match: 258 ((((ac)*)g)t) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 60 candidate abstract regexes: 35 candidate abstract regexes: 28 candidate abstract regexes: 26 candidate abstract regexes: 26 candidate abstract regexes: 26 abstract regexes which match: 26 candidate regexes: 714 candidate regexes: 408 candidate regexes: 318 candidate regexes: 318 candidate regexes: 258 candidate regexes: 256 candidate regexes: 256 regexes which match: 256 ((a|c)*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 60 candidate abstract regexes: 35 candidate abstract regexes: 35 candidate abstract regexes: 35 candidate abstract regexes: 35 candidate abstract regexes: 35 candidate abstract regexes: 35 abstract regexes which match: 35 candidate regexes: 2118 candidate regexes: 813 candidate regexes: 318 candidate regexes: 257 candidate regexes: 255 candidate regexes: 243 candidate regexes: 243 candidate regexes: 240 regexes which match: 240 ((((a|c)|g)|t)*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 27 abstract regexes which match: 12 candidate regexes: 466 candidate regexes: 466 regexes which match: 240 ((((a|c)|g)|t)*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 27 abstract regexes which match: 20 candidate regexes: 614 candidate regexes: 614 regexes which match: 256 ((((ac)g)t)*) -------------------- candidate abstract regexes: 199 abstract regexes which match: 86 candidate regexes: 2110 regexes which match: 243 ((t|(a|c))*) -------------------- candidate abstract regexes: 199 abstract regexes which match: 80 candidate regexes: 2010 regexes which match: 243 (((c|(a|g))*)t) -------------------- candidate abstract regexes: 199 abstract regexes which match: 90 candidate regexes: 2170 regexes which match: 265 (((ac)g)*) -------------------- candidate abstract regexes: 199 abstract regexes which match: 80 candidate regexes: 2010 regexes which match: 253 ((((ac)g)*)t) -------------------- candidate abstract regexes: 199 abstract regexes which match: 27 candidate regexes: 746 regexes which match: 746 (c*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 27 abstract regexes which match: 10 candidate regexes: 418 candidate regexes: 418 regexes which match: 304 (c*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 27 candidate abstract regexes: 10 abstract regexes which match: 10 candidate regexes: 418 candidate regexes: 418 candidate regexes: 304 regexes which match: 255 ((c|t)*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 27 candidate abstract regexes: 19 abstract regexes which match: 10 candidate regexes: 418 candidate regexes: 418 candidate regexes: 304 regexes which match: 255 ((c|t)*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 80 abstract regexes which match: 80 candidate regexes: 2010 candidate regexes: 249 regexes which match: 243 (((cg)|(a|t))*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 83 abstract regexes which match: 70 candidate regexes: 1770 candidate regexes: 249 regexes which match: 243 (((cg)|(a|t))*) -------------------- candidate abstract regexes: 199 candidate abstract regexes: 85 candidate abstract regexes: 70 candidate abstract regexes: 70 candidate abstract regexes: 70 abstract regexes which match: 70 candidate regexes: 1770 candidate regexes: 249 candidate regexes: 243 candidate regexes: 243 candidate regexes: 243 regexes which match: 243 (((cg)|(a|t))*) -------------------- candidate abstract regexes: 199 abstract regexes which match: 136 candidate regexes: 9366 regexes which match: 669 ((ac)g) -------------------- real 0m0.678s user 0m0.668s sys 0m0.004s