sarkar has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks,

I am very new to Perl and I have a question. I have two input files.

In the fist file I have ID with sequence.

File1:

string1 (C)C(T)A string2 T(A)GG(A)GGG(G)

In File2, the second column gives the position where the bracket is located in File1. And the 3rd Column has the same character as located in File1 at that position. And the 4th Column gives the character that it can be replaced with the one in the 3rd Coulmn.

File2:

string1 1 C A string1 3 T C string2 2 A C string2 5 A T string2 9 G A

I looking for an output, which has all the possible combinations (provided in file2).

For example, For string1 Position1, the two characters that this position can have is C and A. Again, string1 Position8, the two characters that this position can gave is T and C. Therefore, I am looking for all the possible combinations possible between C/A and T/C. Similarly, for string2, I am looking for all the possible combinations between A/T , C/A and T/G.

Expected Output:

string1 (C)C(T)A string1 (C)C(C)A string1 (A)C(T)A string1 (A)C(C)A string2 T(A)GG(A)GGG(G) string2 T(A)GG(A)GGG(A) string2 T(A)GG(T)GGG(A) string2 T(A)GG(T)GGG(G) string2 T(C)GG(A)GGG(G) string2 T(C)GG(A)GGG(G) string2 T(C)GG(T)GGG(A) string2 T(C)GG(T)GGG(A)

I am very new to PERL would highly appreciate if somebody can help me with this.

Thank you.

Replies are listed 'Best First'.
Re: Looking for Printing all possible combinations
by choroba (Cardinal) on Feb 13, 2015 at 09:56 UTC
    Here's a simple recursive solution (read the comments for explanation):
    #!/usr/bin/perl use warnings; use strict; sub alternate { my ($id, $string, @changes) = @_; unless (@changes) { print "$id $string\n"; return } # The variant with the original character. my ($bracket, $char) = @{ shift @changes }; alternate($id, $string, @changes); # The variant with the alternative character. $string =~ /\(/g for 1 .. $bracket; # Find the bracket. substr $string, pos $string, 1, $char; alternate($id, $string, @changes); } open my $STRINGS, '<', 'File1' or die $!; open my $ALTS, '<', 'File2' or die $!; while (my $string_line = <$STRINGS>) { my ($string_id, $string) = split ' ', $string_line; # Count the bracketed parts. my $count = ((my $bare_string = $string) =~ s/[()]//g) / 2; my @changes; for my $bracket (1 .. $count) { my $alt_line = <$ALTS>; my ($alt_id, $pos, $orig, $alt) = split ' ', $alt_line; die "Out of sync: $string_id != $alt_id\n" if $string_id ne $a +lt_id; my $char = substr $bare_string, $pos - 1, 1; die "Invalid char at pos $pos in $string_id: $orig != $char\n" if $orig ne $char; push @changes, [ $bracket, $alt ]; } alternate($string_id, $string, @changes); }

    It might be a bit too complex for a newbie, but the task is not really simple, either.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Hello choroba, Thank you very much. I highly appreciate. I am trying to understand the code. But I have one question. I have multiple positions with Brackets (for example I have given 3). What do I need to modify so that it works for multiple brackets in File1 and many such strings in file2. I have around 1million entries in File2. Many Thanks.
        The code given here generates exactly the output you requested. Do you understand recursion?
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Like for Example In File 1: I have given string2 T(A)GG(A)GGG(G) But it can have multiple brackets in different strings such as TAA(A)G(T)G(A)GGAG(G)CCA(A) How does it work? What should I modify in the code? And also my File2 is really big with a million Entries.
Re: Looking for Printing all possible combinations
by Anonymous Monk on Feb 13, 2015 at 09:57 UTC
Re: Looking for Printing all possible combinations
by Laurent_R (Canon) on Feb 13, 2015 at 18:04 UTC
      Hello Laurent R, Thank you for letting me know. I was unaware of it and ones I have a workout out completely, I thought of posting it to the other forums so that if someone is looking for similar things they can benefit from it.
        Yes, by all means, publish there the solutions you got here, and vice_versa, that's great.

        But it is still better to let people know upfront, because even if you don't have a complete solution here, or don't know if it is complete, so that you don't yet inform others yet about the progress here, I might be working on the other site on an even less complete solution. When I first saw your post (on the PG forum), I was in the train commuting from work to home, and I read your post on a mobile device. I was not going to try to provide a solution to this relatively complex problem in these conditions. But if I had been sitting in front of a computer, I might have tried, not knowing that Choroba had possibly already offered here a solution that is perhaps better than what I would have tried to provide.

        Je suis Charlie.
Re: Looking for Printing all possible combinations
by Anonymous Monk on Feb 14, 2015 at 00:09 UTC

    Sorry this isn't newbie friendly:

    open my $f1, '<', \<<''; string1 (C)C(T)A string2 T(A)GG(A)GGG(G) open my $f2, '<', \<<''; string1 1 C A string1 3 T C string2 5 A T string2 9 G A string2 2 A C my %h = map split, <$f1>; tr/()//d, $_ = [split //] for values %h; while (<$f2>) { local $" = ','; my ($k, $i, @combo) = split; $h{$k}[$i-1] = lc "{@combo}"; } for my $k (sort keys %h) { local $" = ''; while (<@{$h{$k}}>) { s/([a-z])/(\u$1)/g; print "$k $_\n"; } }

    Outputs:

    string1 (C)C(T)A string1 (C)C(C)A string1 (A)C(T)A string1 (A)C(C)A string2 T(A)GG(A)GGG(G) string2 T(A)GG(A)GGG(A) string2 T(A)GG(T)GGG(G) string2 T(A)GG(T)GGG(A) string2 T(C)GG(A)GGG(G) string2 T(C)GG(A)GGG(A) string2 T(C)GG(T)GGG(G) string2 T(C)GG(T)GGG(A)
      Hello Anonymous Monk,

      Thank you very much. I completely agree with you. Not at all newbie friendly. But I have one more question.

      Like for Example In File 1: I have given the string2 T(A)GG(A)GGG(G). But my original file has multiple brackets in different strings such as TAA(A)G(T)G(A)GGAG(G)CCA(A). An example provided below. What should I modify in the code? And also my File2 is really big with a million Entries.

      If my file1:

      string1 (C)C(T)A string2 T(A)GG(A)GGG(G) string3 T(A)GG(A)GGG(G)AAAAAAA(C)ACT(G) string4 TAA(A)G(T)G(A)GGAG(G)CCA(A)

      What would you suggest?

        Both this solution and my solution support any number of brackets. Why haven't you tried it?
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ