monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Is there a quick way to break a bracketed string into array, like this:
# From $str1 = "ATC[TG]CC"; # into $ans_of_str1 = [ # 'ATCTCC', 'ATCGCC' ];
Similarly..
Example 2:
# From $str2 = "ATCGC[ATCG]"; # into $ans_of_str2 = [ # 'ATCGCA', 'ATCGCT', 'ATCGCC', 'ATCGCG', ];
Example 3:
When there is no bracket, it returns the string itself.
# From $str3 = "ATCGCT"; # into $ans_of_str3 = [ 'ATCGCT' ];
The position of the bracket can be in any positions, and number of strings enclosed by the bracket can consist up to 4 bases, i.e. [ATCG].

Regards,
Edward

Replies are listed 'Best First'.
Re: Breaking Up a Bracketed String
by Aristotle (Chancellor) on Nov 22, 2005 at 12:56 UTC
    use Set::CrossProduct; my $str = "ATC[TG]CC"; my $strings = []; my @wildcard; $str =~ s{ \[ ([ATCG]+) \] }{ push @wildcard, $1; "%s"; }xgem; # now your string looks like "ATC%sCC" if( @wildcard ) { my @set = map [ split //, $_ ], @wildcard; # now @set contains ( [ 'T', 'G' ] ) # and we weave each possible combination into the %s placeholders +in $str my $xp = Set::CrossProduct->new( \@set ); while( my @tuple = $xp->get ) { push @$strings, sprintf $str, @tuple; } } else { push @$strings, $str; }

    The Set::CrossProduct gymnastics are there so that it works properly for strings like "ATC[TG]CCGC[ACTG]" as well. But it's better if, instead of pushing the expanded strings onto @$strings, you process them immediately within the loop. That way, you won’t run into combinatorial memory consumption explosion even if you have heaps and heaps of variable substrings.

    Makeshifts last the longest.

Re: Breaking Up a Bracketed String
by Roy Johnson (Monsignor) on Nov 22, 2005 at 14:40 UTC
    glob will do it, if you transmogrify the pattern to its liking.
    my $str1 = "ATC[TG]C[CA]"; (my $glob_pat = $str1) =~ s!\[(.*?)\]! '{' . join(',',split//,$1) . '} +'!ge; print "Pattern is $glob_pat\n"; use Data::Dumper; print Dumper([glob($glob_pat)]);
    Actually, character classes as you give are used by glob, but it looks for files with those names. When you use curly-brace alternation, it doesn't require that there be files with those names.

    Caution: Contents may have been coded under pressure.
Re: Breaking Up a Bracketed String
by Happy-the-monk (Canon) on Nov 22, 2005 at 13:05 UTC

    Feeling a bit ashamed of having come up with a cruder and less flexible solution than Aristotle's fine example above, I dare show it anyway. Most notable drawback of this is its expectation of the ocurrence of only one single pair of brackets.

    $str1 = q/ATC[TG]CC/; $ans_of_str2 = all_ans( $str1 ); sub all_ans { my @results; my ( $pre, $multi, $post ) = split /\[(.*)\]/, $_[0]; if ( $multi ) { my @chars = split //, $multi; for ( @chars ) { push @results, "${pre}${_}$post" } return [ @results ]; } return [ $_[0] ]; }

    Cheers, Sören

Re: Breaking Up a Bracketed String
by artist (Parson) on Nov 22, 2005 at 13:28 UTC
    Not so elegant as one from artistotle , but still
    $str1 = "ATC[GT]CC"; $str1 =~ s/\[(.*?)\]//; if($1){ foreach (split //,$1){ push @$ans, "$`$_$'"; } } else{ push @$ans, $str1; } print join "\n", @$ans;
    --Artist
Re: Breaking Up a Bracketed String
by reasonablekeith (Deacon) on Nov 22, 2005 at 15:06 UTC
    And another solution, using map...
    my $string = "ATCGC[ATCG]AAA[GA]"; my @out = ($string =~ m/([^\[]*)/)[0]; while ($string =~ m/\[(\w+)\]/g) { @out = map {my $char=$_;map {$_.$char} @out} split//,$1; } print join("\n", @out) . "\n";

    Hmmm, that missed the AAA's on the output (thanks Roy Johnson)

    Here's another one just for fun :)

    my $string = "ATCGC[ATCG]AAA[GA]"; my @out = ""; $string=~s!([^\[]*)(\[([^\[]*)\])?!@out=map{my$c=$_;map{$_.$1.$c}@out} +split'',$3if$1or$3!ge; print join("\n", @out) . "\n";
    ---
    my name's not Keith, and I'm not reasonable.
Re: Breaking Up a Bracketed String
by Not_a_Number (Prior) on Nov 22, 2005 at 22:13 UTC

    Hi.

    You asked for 'a quick way' to do what you want. If, by any chance, by 'quick' you mean 'fast_to_execute' rather than 'fast_to_program', you might consider the far from elegant code below.

    If you have a significant volume of data to process (and, judging by your questions here on PM, you CatGat guys generally do :), the code below _should_ be faster than any of the replies already posted because:

    1) It doesn't use the perl regex engine

    2) It works directly at string level, without introducing the overhead of converting strings into arrays and back again.

    use strict; use warnings; my $str = 'A[ACGT]G[TG]G[ACT]C[AT]'; # or whatever my $strings = [ $str ]; while ( index ( $$strings[0], '[' ) > -1 ) { $strings = expand ( $strings ) ; } print "$_\n" for @$strings; sub expand { my $arg = shift; my @to_expand = @$arg; my $idx1 = index $to_expand[0], '['; my $idx2 = index $to_expand[0], ']'; my $post = substr $to_expand[0], $idx2 + 1; my @expanded; foreach my $item ( @to_expand ) { my $pre = substr $item, 0, $idx1; for ( $idx1 + 1 .. $idx2 - 1 ) { push @expanded, $pre . substr ( $item, $_, 1 ) . $post; } } return \@expanded; }

    Disclaimer: only partially tested...

Re: Breaking Up a Bracketed String
by monarch (Priest) on Nov 22, 2005 at 13:24 UTC
    Just because I also had a crack at it (and got the solutions requested) I thought I'd publish my long-winded solution.