Thanks Limbic~Region and BrowserUk for your good advice. I have had a quick go with BrowserUk's script and it seems to work OK. However, the merge is not so clean and results after the merge seem not to be in equal chunks (I have posted this on to my scratchpad). I am speculating that this will continue as > continue to add more block (i.e thousands). I have adapted your script as follows:

#! perl -slw use strict; if (scalar(@ARGV) != 1) { print "\n"; print "Usage: script.pl <alignment file>"; print "\n"; exit(); } my ($FILENAME) = @ARGV; #read in file open(DATA, $FILENAME); my( $id, %accu, @order ); ## remove existing files my $remove = "new_alignment_".$FILENAME; #remove any existing results + file if (unlink($remove) == 1) { print "Existing \"$remove\" file was removed\n +"; } ## generate a temporary storage file my $outputfile = "new_alignment_".$FILENAME; #make a big file in which + the final results will be printed unless ( open(POS, ">>$outputfile") ) { print "Cannot open file \"$outputfile\" to write to!!\n\n"; exit; } while( <DATA> ) { chomp; if( m[^(\S+_\S+_\S+)\s+(.+)\s*$] ) { $id = $1; unless( exists $accu{ $id } ) { push @order, $id; $accu{ $id } = $2; } else { $accu{ $id } .= ' ' . $2; } } else { $accu{ $id } .= ' ' . $_; } } for my $key ( @order ) { printf POS "%-10s %s\n", $key, substr( $accu{ $key }, 0, 55, '' ); print POS substr( $accu{ $key }, 0, 66, '' ) while length $accu{ $ +key }; print POS ''; }
I must also confess I do not understand the last bit of code:
for my $key ( @order ) { printf POS "%-10s %s\n", $key, substr( $accu{ $key }, 0, 55, '' ); print POS substr( $accu{ $key }, 0, 66, '' ) while length $accu{ $ +key }; print POS ''; }
Could you kindly please explain it to me.

PS: Still thinking of how to implement what Limbic~Region has pointed out.

$new_guy


In reply to Re^2: concatenating identical sequences by $new_guy
in thread concatenating identical sequences by $new_guy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.