Hi PerlMonks,

I have a string i.e. $string="ATATGCGCAT" 10-letter long comprising of four letters A,T,G,C. I am interested in getting all possible combinations of 10-letter without changing their positions in the string and considering 2 (or varying) levels for each of A,T,G & C. Moreover, I have used a sliding window of size 4 in the script try.pl. I want to keep the provision of the window size in the script. This is because when the string length is more than 40 with varying levels of basic letters, then the number of possible combinations becomes very large and cmd does not give the results. Using window size at first I want to divide the string into fragments. Each smaller fragment will be used to produce a set of combinations. Then the first combination of the first fragment will be concatenated with the first combination of the second fragment to produce a new combination, which will then be concatenated with the first combination of third fragment till the entire length of the original string. Similarly, other combinations will be produced.

I have written a script try.pl which produces all combinations of varying sizes (ranging from 1 to 8 letters only). I need only the combinations of actual length of the original string (i.e. 10 in this case) in the output file & each combination starting with a symbol "~" and ending with "~". I am at my wit's end to solve this problem.

Here goes the script try.pl:

#!/usr/bin/perl use warnings; $string="ATATGCGCAT"; ########################################### # Output to a TEXT File: ########################################### $output="Results .txt"; open (my $fh,">",$output) or die"Cannot open file '$output'.\n"; ##################################### # To break into 4-letter fragments: ##################################### while ($string=~ /(.{4}?)/ig) {$four=$&; @sw=$four=~ /[ATGC]{1}/igs; foreach my $single (@sw) { #################################################### # To extract single letter & append perd to single: #################################################### $perd="%d"; $mod_four=$single.$perd; # concatenation push @new_four,$mod_four; $new_four = join ('',@new_four); # To produce all possible combinations without changing positions: for $a (1 .. 2) { # a has 2 levels: for $t (1 .. 2) { # t has 2 levels: for $g (1 .. 2) { # g has 2 levels: for $c (1 .. 2) { # c has 2 levels: $combi=sprintf($new_four,$a,$t,$g,$c,3-$a,3-$t,3-$g,3-$c); print"~$combi\n"; print $fh "~$combi\n"; } } } } } # 2nd foreach closes: } # 1st while closes: print"~"; print"\n"; print $fh "~"; print $fh "\n"; close $output; exit;

I have got the following results in the output text file Results .txt. This is not what I want:

~A1 ~A1 ~A1 ~A1 ~A1 ~A1 ~A1 ~A1 ~A2 ~A2 ~A2 ~A2 ~A2 ~A2 ~A2 ~A2 ~A1T1 ~A1T1 ~A1T1 ~A1T1 ~A1T2 ~A1T2 ~A1T2 ~A1T2 ~A2T1 ~A2T1 ~A2T1 ~A2T1 ~A2T2 ~A2T2 ~A2T2 ~A2T2 ~A1T1A1 ~A1T1A1 ~A1T1A2 ~A1T1A2 ~A1T2A1 ~A1T2A1 ~A1T2A2 ~A1T2A2 ~A2T1A1 ~A2T1A1 ~A2T1A2 ~A2T1A2 ~A2T2A1 ~A2T2A1 ~A2T2A2 ~A2T2A2 ~A1T1A1T1 ~A1T1A1T2 ~A1T1A2T1 ~A1T1A2T2 ~A1T2A1T1 ~A1T2A1T2 ~A1T2A2T1 ~A1T2A2T2 ~A2T1A1T1 ~A2T1A1T2 ~A2T1A2T1 ~A2T1A2T2 ~A2T2A1T1 ~A2T2A1T2 ~A2T2A2T1 ~A2T2A2T2 ~A1T1A1T1G2 ~A1T1A1T2G2 ~A1T1A2T1G2 ~A1T1A2T2G2 ~A1T2A1T1G2 ~A1T2A1T2G2 ~A1T2A2T1G2 ~A1T2A2T2G2 ~A2T1A1T1G1 ~A2T1A1T2G1 ~A2T1A2T1G1 ~A2T1A2T2G1 ~A2T2A1T1G1 ~A2T2A1T2G1 ~A2T2A2T1G1 ~A2T2A2T2G1 ~A1T1A1T1G2C2 ~A1T1A1T2G2C2 ~A1T1A2T1G2C2 ~A1T1A2T2G2C2 ~A1T2A1T1G2C1 ~A1T2A1T2G2C1 ~A1T2A2T1G2C1 ~A1T2A2T2G2C1 ~A2T1A1T1G1C2 ~A2T1A1T2G1C2 ~A2T1A2T1G1C2 ~A2T1A2T2G1C2 ~A2T2A1T1G1C1 ~A2T2A1T2G1C1 ~A2T2A2T1G1C1 ~A2T2A2T2G1C1 ~A1T1A1T1G2C2G2 ~A1T1A1T2G2C2G2 ~A1T1A2T1G2C2G1 ~A1T1A2T2G2C2G1 ~A1T2A1T1G2C1G2 ~A1T2A1T2G2C1G2 ~A1T2A2T1G2C1G1 ~A1T2A2T2G2C1G1 ~A2T1A1T1G1C2G2 ~A2T1A1T2G1C2G2 ~A2T1A2T1G1C2G1 ~A2T1A2T2G1C2G1 ~A2T2A1T1G1C1G2 ~A2T2A1T2G1C1G2 ~A2T2A2T1G1C1G1 ~A2T2A2T2G1C1G1 ~A1T1A1T1G2C2G2C2 ~A1T1A1T2G2C2G2C1 ~A1T1A2T1G2C2G1C2 ~A1T1A2T2G2C2G1C1 ~A1T2A1T1G2C1G2C2 ~A1T2A1T2G2C1G2C1 ~A1T2A2T1G2C1G1C2 ~A1T2A2T2G2C1G1C1 ~A2T1A1T1G1C2G2C2 ~A2T1A1T2G1C2G2C1 ~A2T1A2T1G1C2G1C2 ~A2T1A2T2G1C2G1C1 ~A2T2A1T1G1C1G2C2 ~A2T2A1T2G1C1G2C1 ~A2T2A2T1G1C1G1C2 ~A2T2A2T2G1C1G1C1 ~

Correct results in output file Results .txt should look like:

~A1T1A1T1G2C2G2C2A?T? ~A1T1A1T2G2C2G2C1A?T? ..................... ..................... ..................... ~

For 9th & 10th place of desired results I have used ? sign to indicate unknown number.


In reply to How can one get all possible combinations of a string without changing positions & using window size? by supriyoch_2008

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.