comment on

Right, the character limit is frustrating, how I would sort by snoRNA, well since the snoRNAs 25, 26 and 27 are the same ,sequence-wise, then I named that particular field "snoRNA 26 27 and 28" accordingly, hence sorting it would be as "25,26 and 27". So consider:

6
GI:50845406
NM_031444.2  
snoRNA3 -9 Box D except for snoRNA4
Query  3     CTGGAGTCAAGGCT  16
             ||||||||||||||
Sbjct  1297  CTGGAGTCAAGGCT  1284
Homo sapiens chromosome 22 open reading frame 13 (C22orf13), mRNA.
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do
+pt=GenBank&RID=UEDK8KR6016&log%24=nuclalign&blast_rank=13&list_uids=5
+0845406
5
GI:38327560
NM_006282.2  
snoRNA3 -9 Box D except for snoRNA4
Query  5     GGAGTCAAGGCTAC  18
             ||||||||||||||
Sbjct  5129  GGAGTCAAGGCTAC  5116
Homo sapiens serine/threonine kinase 4 (STK4), mRNA
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do
+pt=GenBank&RID=UEDK8KR6016&log%24=nuclalign&blast_rank=14&list_uids=3
+8327560
3
GI:91982771
NM_001040105.1  
snoRNA 10
Query  4     TGGAGTCAAT  13
             ||||||||||
Sbjct  4854  TGGAGTCAAT  4845
Homo sapiens mucin 17, cell surface associated (MUC17), mRNA.
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do
+pt=GenBank&RID=UDU305DZ01N&log%24=nuclalign&blast_rank=97&list_uids=9
+1982771
3
GI:154448895
NM_001100162.1  
snoRNA 25, 26 and 27
Query  2    CCTGGAGTCGAGTG  15
            ||||||||||||||
Sbjct  146  CCTGGAGTCGAGTG  133
Homo sapiens exportin 7 (XPO7), transcript variant 3, mRNA.
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do
+pt=GenBank&RID=UDW41RSS01S&log%24=nuclalign&blast_rank=2&list_uids=15
+4448895                    
31
GI:153945877
NM_002458.1  
snoRNA 25, 26 and 27
Query  3     CTGGAGTCGAGTG  15
             |||||||||||||
Sbjct  6818  CTGGAGTCGAGTG  6806
Query  3     CTGGAGTCGAGTG  15
             |||||||||||||
Sbjct  8489  CTGGAGTCGAGTG  8477
Query  3      CTGGAGTCGAGTG  15
              |||||||||||||
Sbjct  10589  CTGGAGTCGAGTG  10577
Query  3      CTGGAGTCGAGTG  15
              |||||||||||||
Sbjct  12260  CTGGAGTCGAGTG  12248
Homo sapiens mucin 5B, oligomeric mucus/gel-forming (MUC5B), mRNA.
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do
+pt=GenBank&RID=UDW41RSS01S&log%24=nuclalign&blast_rank=9&list_uids=15
+3945877
[download]

I would wanna sort according to :

snoRNA 10
snoRNA3 -9 Box D except for snoRNA4
snoRNA 25, 26 and 2

So I would get the GI, NM, seqs (query and subject), geneName, exon# and weblink for each one of the snoRNAs, a structure like

'snoRNA 25,26 and 27'=>[
        { 
          GI=>'GI:15444889',
          NM=>'NM_001100162.1',
          exon=>'3',
          seq=>[
                Query  2    CCTGGAGTCGAGTG  15
                             ||||||||||||||
                Sbjct  146  CCTGGAGTCGAGTG  133
                      ],
           geneName =>'Homo sapiens mucin 17, cell surface associated 
+(MUC17), mRNA.',
           weblink=>'http://......'
                },
        {
          GI=>'GI:153945877',
          NM=>'NM_002458.1'
          exon=>'31',
          seq=>[
                  #more than one seq   
                    ]
                },
    ],
'snoRNA3 -9 Box D except for snoRNA4'=>[
                     #more than record once again
                      .......
                       ]
[download]

UPDATE:: Here's my adaptation of your earlier code, I just would need to change the order in which a record is arranged by bringing the "snoRNA" to the top before the "exon" and ensure that spaces are avoided at all costs this works the same way as the code you have modified from Re: split a file into records and process it ...

use strict;
use Data::Dump qw[ pp ];

my %records;
until(eof(DATA)){
        chomp(my $snoRNA = <DATA>);
        push @{$records{$snoRNA}},{} ;
        my $seqs = 1;
        my $line = <DATA>;
        if( $line =~ m[(\d+) different hits] ) {
                $seqs = $1;
                chomp( $records{ $snoRNA }[ -1 ]{ exon } = <DATA> );
                 } else {
                        chomp( $records{ $snoRNA }[ -1 ]{ exon } = $li
+ne );
                        }
    chomp( $records{ $snoRNA }[ -1 ]{ GeneID } = <DATA> );
    chomp( $records{ $snoRNA }[ -1 ]{ NM_ID } = <DATA> );
    for( 1 .. $seqs ) {
        chomp( my $query = <DATA> );
        scalar (<DATA>);
        chomp( my $sbjct = <DATA> );
        push @{ $records{ $snoRNA }[ -1 ]{ seqs } }, { $query => $sbjc
+t };
    }
    chomp( $records{ $snoRNA }[ -1 ]{ gene_name } = <DATA> );
    chomp( $records{ $snoRNA }[ -1 ]{ web_link  } = <DATA> );
        }
        
pp \%records;
__DATA__
snoRNA 25, 26 and 27
2
GI:142387131
NM_006299.3
Query  2    CCTGGAGTCGAGT  14
            |||||||||||||
Sbjct  371  CCTGGAGTCGAGT  359
Homo sapiens zinc finger protein 193 (ZNF193), mRNA.
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do
+pt=GenBank&RID=UDW41RSS01S&log%24=nuclalign&blast_rank=11&list_uids=1
+42387131
snoRNA 25, 26 and 27
1
NM_001005236.3
GI:256773198
Query  3    CTGGAGTCGAGTGTCT  18
            |||||| |||||||||
Sbjct  168  CTGGAGACGAGTGTCT  153
Homo sapiens olfactory receptor, family 1, subfamily L, member 1 (OR1L
+1), mRNA.
http://www.ncbi.nlm.nih.gov/
snoRNA 25, 26 and 27
4 different hits
31
GI:153945877
NM_002458.1
Query  3     CTGGAGTCGAGTG  15
             |||||||||||||
Sbjct  6818  CTGGAGTCGAGTG  6806
Query  3     CTGGAGTCGAGTG  15
             |||||||||||||
Sbjct  8489  CTGGAGTCGAGTG  8477
Query  3      CTGGAGTCGAGTG  15
              |||||||||||||
Sbjct  10589  CTGGAGTCGAGTG  10577
Query  3      CTGGAGTCGAGTG  15
              |||||||||||||
Sbjct  12260  CTGGAGTCGAGTG  12248
Homo sapiens mucin 5B, oligomeric mucus/gel-forming (MUC5B), mRNA.
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do
+pt=GenBank&RID=UDW41RSS01S&log%24=nuclalign&blast_rank=9&list_uids=15
+3945877
[download]

Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.

In reply to Re^4: split a file into records and process it by biohisham
in thread split a file into records and process it by biohisham

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.