comment on

I have a short script which splits a DNA-sequence (imported by BioPerl) by a user-provided pattern and subsequently populates a hash with modified versions of the resulting fragments along with a unique ID. Each fragment produced by split appears in the hash twice - in the original form and in a reverse complement form (F and R respectively).

#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
my %sequences;
my $seqio = Bio::SeqIO->new(-file => $ARGV[0]);
my $enz = $ARGV[1];
while(my $seqobj = $seqio->next_seq) {
    my $id  = $seqobj->display_id;
    my $seq = $seqobj->seq;
    $sequences{$id} = $seq;
}

my @fragments;
for my $value (values %sequences) {
    @fragments = split(/$enz/, $value);
}

my $Lfill = "TT";
my $Rfill = "AA";
my $ID = 0;
my %bins;
foreach my $RF (@fragments) {
    $ID++;
    $RF = $Lfill.$RF.$Rfill;
    $bins{$ID."F"} = $RF;
    (my $rev = $RF) =~ tr/ACGT/TGCA/;
    $bins{$ID."R"} = reverse($rev);
}
[download]

Example Input:

>example
AAGTAGCATCGATTTATAGCATCGACTAGTAAGCTTAGCTACGATCAGCTACGATCGAGCGACTACGTAG
+C
[download]

Fragments Generated:

1F => TTAAGTAGCATCGATTTATAGCATCGACTAGTAA
1R => TTACTAGTCGATGCTATAAATCGATGCTACTTAA
2F => TTAGCTACGATCAGCTACGATCGAGCGACTACGTAGCAA
2R => TTGCTACGTAGTCGCTCGATCGTAGCTGATCGTAGCTAA
[download]

I now want to obtain unique combinations (of two) whilst excluding a fragment from combining with its own reverse cognate. Using the example above, it would produce:

1F2F => TTAAGTAGCATCGATTTATAGCATCGACTAGTAATTAGCTACGATCAGCTACGATCGAGCGA
+CTACGTAGCAA
1F2R  => TTAAGTAGCATCGATTTATAGCATCGACTAGTAATTGCTACGTAGTCGCTCGATCGTAGCT
+GATCGTAGCTAA
1R2F => TTACTAGTCGATGCTATAAATCGATGCTACTTAATTAGCTACGATCAGCTACGATCGAGCGA
+CTACGTAGCAA
1R2R => TTACTAGTCGATGCTATAAATCGATGCTACTTAATTGCTACGTAGTCGCTCGATCGTAGCTG
+ATCGTAGCTAA
[download]

But would not produce 1F1R or 2F2R. As shown above, both the keys of the involved fragments are combined as well as the values - and stored in a new hash.

I've tried using the CPAN modules Algorithm::Combinatorics and Math::Combinatorics but can't seem to adapt these to fit this task.

Does anybody have any code snippets, examples or suggestions that could help achieve this? If it helps: i'm very new to Perl.

In reply to Obtaining combinations of hash keys and values by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.