Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I have a number of motifs in a file which have start and end positions, corresponding to their positions in a string.
e.g.where here 30 is the start position and 43 the end position > genome.ptt_30 43 tggattgactgtg > genome.ptt_107 128 ctgctgcatgtgatgactgtg > genome.ptt_209 254 gcgccggactatgattgagctagcgtatgctgcatgctgatgtgt
However, I want to divide the motifs into groups based on their end positions according to a user defined value e.g of 100 (where every 100 characters a new group is formed and the motifs are divided into these groups). Please can someone help!?
e.g desired output: group 1 (1-100): > genome.ptt_30 43 tggattgactgtg group 2 (101-200): > genome.ptt_107 128 ctgctgcatgtgatgactgtg group 3 (201-300): > genome.ptt_209 254 gcgccggactatgattgagctagcgtatgctgcatgctgatgtgt

Replies are listed 'Best First'.
Re: Numerical problem!
by Skeeve (Parson) on Jul 13, 2004 at 15:41 UTC
    Homework?

    Question 1) What about a group like:
    > genome.ptt_289 334 gcgccggactatgattgagctagcgtatgctgcatgctgatgtgt
    Question 2) Can you show us the effort you already made? This is not a "Tell us your problem and we'll give you the solution" community
Re: Numerical problem!
by tachyon (Chancellor) on Jul 13, 2004 at 17:30 UTC
Re: Numerical problem!
by ysth (Canon) on Jul 13, 2004 at 18:25 UTC
    Without knowing what data structure you have your start and end positions in, this is a pure guess:
    my @motif = ( [30,43], [107,128], [209,254] ); # make sure it is in order by beginning position @motif = sort { $a->[0] <=> $b->[0] } @motif; my $last_group = 0; for my $motif (@motif) { my $group = int(($motif->[0]+99)/100); if ($group != $last_group) { print "\n" if $last_group; # blank between groups print "group $group (", $group*100-99, "-", $group*100, "):\n" +; $last_group = $group; } print "> genome.ptt_", $motif->[0], " ", $motif->[1], "\n", substr($string, $motif->[0]-1, $motif->[1] - $motif->[0]), "\n +"; }
    Doesn't show group headers for groups with no motifs. Assumes you want motifs that span groups to be reported only with the first group they are part of, not the last or all. Assumes you are using 1-based offsets (yuck) based on the 1-100, 101-200, stuff.