comment on

Dear monks, Am having some problem in printing data from array of arrays. I am almost there. but, not able to print the header. here is the problem:

inputfile: 

>1_1 geneid1 34 45 len=10 
AGTCGA 
GCAA 
>1_2 geneid1 54 75 len=21 
AGTCGAAGTCGA 
ACAAACAAT 
>2_1 geneid1 78 83 len=5 
CGTCG 
>1_1 geneid2 14 25 len=11 
AGTCGAA 
GCAA 
>2_1 geneid2 4 12 len=8 
AGTCGAAT 
>2_3 geneid2 19 27 len=8 
AGTC 
GCAA 
>2_2 geneid2 89 95 len=6 
AAAAAA 

--------------------------- 

facts: 
1) this is just a sample. but the real file will be 2 GB in size. the 
+AGTC bases wil be in 1000's to millions. 

problem: 
1) i have to join the sequences belonging to the same number and also 
+same geneid. 
2) the output file wil be a list of sequences (each sequence wil be a 
+set of numbers joined together but have one header {which is my probl
+em now that i am not able to print it} ). 

--------------------------------- 
sample output file: 

>1 gid1 
AGTCGA 
GCAA 
AGTCGAAGTCGA 
ACAAACAAT 
>2 gid1 
CGTCG 
>1 gid2 
AGTCGAA 
GCAA 
>2 gid2 
AGTCGAAT 
AAAAAA 
AGTC 
GCAA
[download]

i have removed the extra header information. now it has only the num id and geneid. point to notice, if u see the input file, the 2_2 shud join in the second place of the set geneid2. so i have joined that 2_2 sequence after 2_1 sequence and proceeded by 2_3. the script written so far on that:

use strict; 
use warnings; 

my @AoA = (); 
MAIN: while(<DATA>){ 
   if (/^>(\d+)_(\d+)\s+geneid(\d+)/o) { 
      my ($tops, $mids, $subs) = ($3, $1, $2); 
      $tops -= 1; 
      $mids -= 1; 
      $subs -= 1; 
      SUB: while(<DATA>){ 
         redo MAIN unless (/^[ACGT]/o); 
         chomp;  
         push @{$AoA[$tops][$mids][$subs]}, $_; 
      } 
   } 
} 
for my $i (@AoA) { 
   for my $j (@{$i}) { 
      for my $n (@{$j}) { 
         for my $r (@{$n}) { 
            print $r,"\n"; 
         } 
      }     
   } 
} 
__DATA__ 
>1_1 geneid1 34 45 len=10  
AGTCGA  
GCAA  
>1_2 geneid1 54 75 len=21  
AGTCGAAGTCGA  
ACAAACAAT  
>2_1 geneid1 78 83 len=5  
CGTCG  
>1_1 geneid2 14 25 len=11  
AGTCGAA  
GCAA  
>2_1 geneid2 4 12 len=8  
AGTCGAAT  
>2_3 geneid2 19 27 len=8  
AGTC  
GCAA  
>2_2 geneid2 89 95 len=6  
AAAAAA
[download]

please guide.

In reply to array of arrays - printing data by sugar

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.