in reply to Re: How to expand a string
in thread How to expand a string

really it's not - I'm a postdoc trying to write a quick program to analyse DNA sequences! There's a heck of a lot of them to do and this was just a simple example - in some cases in DNA, 3 bases can be represented by 1 letter so it's not as simple as it first looks (at least to me!)

Replies are listed 'Best First'.
Re^3: How to expand a string
by lima1 (Curate) on Nov 29, 2007 at 15:42 UTC
    use strict; use warnings; use Data::Dumper; my %replace = ( R => '{A,G}', S => '{C,G}', K => '{G,T}', ' ' => '_', ); my $s = 'KAG GTR CAG CTG AAG SAG TCA GG'; my @results; for my $base (keys %replace) { $s =~ s/$base/$replace{$base}/g; } push @results, $_ while glob $s; @results = map { s/_/ /g; $_ } @results;
    Update: BrowserUK's solution. Credits to him.
      Genius!! Anything I'd have written would have been considerably longer! Thanks very much.
Re^3: How to expand a string
by jrsimmon (Hermit) on Nov 29, 2007 at 16:01 UTC
    This isn't a complete solution, but I think it will get you started on the right path. If you need more detailed help, I'll need a more detailed description of the problem (namely, expected inputs and outputs). What you want to do is iterate over the string and replace the current character with possible replacement characters. You could use recursion as mentioned above, but that's probably overkill for what you're looking for.
    use strict; #the substitution possibilities my @a = ('a'); my @b = ('d','e'); my @c = ('f','g','h'); my $string = pop(@_); #this is an argument passed to the script on th +e command line #my $string = "abc"; #if you want it hardcoded my $expanded_string = ""; my @strings = (); foreach my $first (@a){ foreach my $second (@b){ foreach my $third (@c){ $string = "$first$second$third"; push(@strings, $string); #to store them all print "$string\n"; #to print this particular one } } }
      Thanks, all the replies are helping me get there but it's still a bit of a nightmare to code. Here's some example inputs etc.

      R = A or G
      S = C or G
      K = G or T
      All the other letters stay constant.

      my $seq1 = "CAG GTR CAG CTG AAG SAG TCA GG";
      my $seq2 = "GAK GTG CAG CTT CAG CAG TCR GG";

      The gaps between sets of 3 letters aren't important - it just signifies DNA codons.

      So, both seq 1 and seq 2 have 4 possible resulting sequences. I need to store the 4 seqs associated with seq 1 separately e.g. in a different array to those of seq 2.

      If you have any ideas about the best way to do this I'd be very grateful! Speed isn't a big consideration, as long as it works!

      Thanks for your help.

        #! perl -slw use strict; my @seqs = ( "CAG GTR CAG CTG AAG SAG TCA GG", "GAK GTG CAG CTT CAG CAG TCR GG", ); for my $seq ( @seqs ) { print $seq; $seq =~ tr[ ][_]; $seq =~ s[R][{A,G}]g; $seq =~ s[S][{C,G}]g; $seq =~ s[K][{G,T}]g; tr[_][ ] and print "\t$_" while glob $seq; } __END__ CAG GTR CAG CTG AAG SAG TCA GG CAG GTA CAG CTG AAG CAG TCA GG CAG GTA CAG CTG AAG GAG TCA GG CAG GTG CAG CTG AAG CAG TCA GG CAG GTG CAG CTG AAG GAG TCA GG GAK GTG CAG CTT CAG CAG TCR GG GAG GTG CAG CTT CAG CAG TCA GG GAG GTG CAG CTT CAG CAG TCG GG GAT GTG CAG CTT CAG CAG TCA GG GAT GTG CAG CTT CAG CAG TCG GG

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        A somewhat general solution could look like:

        use strict; use warnings; use Data::Dump::Streamer; my %replace = ( R => '{A,G}', S => '{C,G}', K => '{G,T}', ' ' => '_', ); my %seqs = ( seq1 => {org => "CAG GTR CAG CTG AAG SAG TCA GG"}, seq2 => {org => "GAK GTG CAG CTT CAG CAG TCR GG"} ); for my $seqKey (keys %seqs) { $seqs{$seqKey}{glob} = $seqs{$seqKey}{org}; $seqs{$seqKey}{glob} =~ s/$_/$replace{$_}/g for keys %replace; push @{$seqs{$seqKey}{expanded}}, map {y/_/ /; $_} glob $seqs{$seq +Key}{glob}; } Dump (\%seqs);

        Prints:

        $RO1 = 'seq1'; make_ro($RO1); $HASH1 = { expanded => [ 'CAG GTA CAG CTG AAG CAG TCA GG', 'CAG GTA CAG CTG AAG GAG TCA GG', 'CAG GTG CAG CTG AAG CAG TCA GG', 'CAG GTG CAG CTG AAG GAG TCA GG' ], glob => 'CAG_GT{A,G}_CAG_CTG_AAG_{C,G}AG_TCA_GG', org => 'CAG GTR CAG CTG AAG SAG TCA GG' }; $RO2 = 'seq2'; make_ro($RO2); $HASH2 = { expanded => [ 'GAG GTG CAG CTT CAG CAG TCA GG', 'GAG GTG CAG CTT CAG CAG TCG GG', 'GAT GTG CAG CTT CAG CAG TCA GG', 'GAT GTG CAG CTT CAG CAG TCG GG' ], glob => 'GA{G,T}_GTG_CAG_CTT_CAG_CAG_TC{A,G}_GG', org => 'GAK GTG CAG CTT CAG CAG TCR GG' };

        Note that you well need Perl 5.6.0 or later for the glob to work correctly.


        Perl is environmentally friendly - it saves trees