in reply to concatenating identical sequences
I'd use a similar approach to Limbic~Region, but I'd accumulate the blocks as strings rather than arrays as they consume considerably less space, which will put off the time when you need to use a multi-pass solution. Building an array of newly read IDs as you go, allows for retaining the original ordering on output.
This:
#! perl -slw use strict; my( $id, %accu, @order ); while( <DATA> ) { chomp; if( m[^(\S+_\S+_\S+)\s+(.+)\s*$] ) { $id = $1; unless( exists $accu{ $id } ) { push @order, $id; $accu{ $id } = $2; } else { $accu{ $id } .= ' ' . $2; } } else { $accu{ $id } .= ' ' . $_; } } for my $key ( @order ) { printf "%-10s %s\n", $key, substr( $accu{ $key }, 0, 55, '' ); print substr( $accu{ $key }, 0, 66, '' ) while length $accu{ $key +}; print ''; } __DATA__ 5390_7_9 MEWYKKIGLL ATTGLALVGL GACSNYGKSA DGTVTIEYFN QKKEMTKTLE EITRDFEKEN PKIKVKVVNV PNAGEVLKTR VLAGDVPDVV NIYPQSIELQ EWAKAGVFED 5390_8_1 MKWYKKIGLL ATTGLALVGL GACSNYGKSA DGTVTIEYFN QKKEMTKTLE EITRDFEKEN PKIKVKVVNV PNAGEVLKTR VLAGDVPDVV NIYPQSIELQ EWAKAGVFED 5390_8_2 MEWYKKIGLL ATTALALFGL GACSNYGKSA DDTVTIEYFN QKKEMTKILE EITRDFEKEN SKIKVKVVNV PNAGEVLKTR VLAGDVPDVV NIYPQSIELQ EWAKAGVFED 5390_7_9 MEWYKKIGLL ATTGLALVGL GACSNYGKSA DGTVTIEYFN QKKEMTKTLE EITRDFEKEN PKIKVKVVNV PNAGEVLKTR VLAGDVPDVV NIYPQSIELQ EWAKAGVFED LSNKDYLKRV KNGYAEKYAV NEKVYNVPFT ANAYGIYYNK DKFEELGLKV PETWDEFEQL 5390_8_1 MKWYKKIGLL ATTGLALVGL GACSNYGKSA DGTVTIEYFN QKKEMTKTLE 5390_8_2 MEWYKKIGLL ATTALALFGL GACSNYGKSA DDTVTIEYFN QKKEMTKILE EITRDFEKEN SKIKVKVVNV PNAGEVLKTR VLAGDVPDVV NIYPQSIELQ EWAKAGVFED LSNKDYLKRV KNGYAEKYAV NEKVYNVPFT ANAYGIYYNK DKFEELGLKV PETWDEFEQL
Produces:
C:\test>junk35 5390_7_9 MEWYKKIGLL ATTGLALVGL GACSNYGKSA DGTVTIEYFN QKKEMTKTLE EITRDFEKEN PKIKVKVVNV PNAGEVLKTR VLAGDVPDVV NIYPQSIELQ EWAKAGVFED MEWYKKIGLL ATTGLALVGL GACSNYGKSA DGTVTIEYFN QKKEMTKTLE EITRDFEKEN PKIKVKVVNV PNAGEVLKTR VLAGDVPDVV NIYPQSIELQ EWAKAGVFED LSNKDYLKRV KNGYAEKYAV NEKVYNVPFT ANAYGIYYNK DKFEELGLKV PETWDEFEQL 5390_8_1 MKWYKKIGLL ATTGLALVGL GACSNYGKSA DGTVTIEYFN QKKEMTKTLE EITRDFEKEN PKIKVKVVNV PNAGEVLKTR VLAGDVPDVV NIYPQSIELQ EWAKAGVFED MKWYKKIGLL ATTGLALVGL GACSNYGKSA DGTVTIEYFN QKKEMTKTLE 5390_8_2 MEWYKKIGLL ATTALALFGL GACSNYGKSA DDTVTIEYFN QKKEMTKILE EITRDFEKEN SKIKVKVVNV PNAGEVLKTR VLAGDVPDVV NIYPQSIELQ EWAKAGVFED MEWYKKIGLL ATTALALFGL GACSNYGKSA DDTVTIEYFN QKKEMTKILE EITRDFEKEN SKIKVKVVNV PNAGEVLKTR VLAGDVPDVV NIYPQSIELQ EWAKAGVFED LSNKDYLKRV KNGYAEKYAV NEKVYNVPFT ANAYGIYYNK DKFEELGLKV PETWDEFEQL
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: concatenating identical sequences
by $new_guy (Acolyte) on Oct 05, 2011 at 05:49 UTC | |
by BrowserUk (Patriarch) on Oct 05, 2011 at 10:56 UTC | |
by $new_guy (Acolyte) on Oct 10, 2011 at 12:14 UTC | |
by BrowserUk (Patriarch) on Oct 05, 2011 at 10:35 UTC |