I would wanna sort according to :6 GI:50845406 NM_031444.2 snoRNA3 -9 Box D except for snoRNA4 Query 3 CTGGAGTCAAGGCT 16 |||||||||||||| Sbjct 1297 CTGGAGTCAAGGCT 1284 Homo sapiens chromosome 22 open reading frame 13 (C22orf13), mRNA. http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do +pt=GenBank&RID=UEDK8KR6016&log%24=nuclalign&blast_rank=13&list_uids=5 +0845406 5 GI:38327560 NM_006282.2 snoRNA3 -9 Box D except for snoRNA4 Query 5 GGAGTCAAGGCTAC 18 |||||||||||||| Sbjct 5129 GGAGTCAAGGCTAC 5116 Homo sapiens serine/threonine kinase 4 (STK4), mRNA http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do +pt=GenBank&RID=UEDK8KR6016&log%24=nuclalign&blast_rank=14&list_uids=3 +8327560 3 GI:91982771 NM_001040105.1 snoRNA 10 Query 4 TGGAGTCAAT 13 |||||||||| Sbjct 4854 TGGAGTCAAT 4845 Homo sapiens mucin 17, cell surface associated (MUC17), mRNA. http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do +pt=GenBank&RID=UDU305DZ01N&log%24=nuclalign&blast_rank=97&list_uids=9 +1982771 3 GI:154448895 NM_001100162.1 snoRNA 25, 26 and 27 Query 2 CCTGGAGTCGAGTG 15 |||||||||||||| Sbjct 146 CCTGGAGTCGAGTG 133 Homo sapiens exportin 7 (XPO7), transcript variant 3, mRNA. http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do +pt=GenBank&RID=UDW41RSS01S&log%24=nuclalign&blast_rank=2&list_uids=15 +4448895 31 GI:153945877 NM_002458.1 snoRNA 25, 26 and 27 Query 3 CTGGAGTCGAGTG 15 ||||||||||||| Sbjct 6818 CTGGAGTCGAGTG 6806 Query 3 CTGGAGTCGAGTG 15 ||||||||||||| Sbjct 8489 CTGGAGTCGAGTG 8477 Query 3 CTGGAGTCGAGTG 15 ||||||||||||| Sbjct 10589 CTGGAGTCGAGTG 10577 Query 3 CTGGAGTCGAGTG 15 ||||||||||||| Sbjct 12260 CTGGAGTCGAGTG 12248 Homo sapiens mucin 5B, oligomeric mucus/gel-forming (MUC5B), mRNA. http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do +pt=GenBank&RID=UDW41RSS01S&log%24=nuclalign&blast_rank=9&list_uids=15 +3945877
UPDATE:: Here's my adaptation of your earlier code, I just would need to change the order in which a record is arranged by bringing the "snoRNA" to the top before the "exon" and ensure that spaces are avoided at all costs this works the same way as the code you have modified from Re: split a file into records and process it ...'snoRNA 25,26 and 27'=>[ { GI=>'GI:15444889', NM=>'NM_001100162.1', exon=>'3', seq=>[ Query 2 CCTGGAGTCGAGTG 15 |||||||||||||| Sbjct 146 CCTGGAGTCGAGTG 133 ], geneName =>'Homo sapiens mucin 17, cell surface associated +(MUC17), mRNA.', weblink=>'http://......' }, { GI=>'GI:153945877', NM=>'NM_002458.1' exon=>'31', seq=>[ #more than one seq ] }, ], 'snoRNA3 -9 Box D except for snoRNA4'=>[ #more than record once again ....... ]
use strict; use Data::Dump qw[ pp ]; my %records; until(eof(DATA)){ chomp(my $snoRNA = <DATA>); push @{$records{$snoRNA}},{} ; my $seqs = 1; my $line = <DATA>; if( $line =~ m[(\d+) different hits] ) { $seqs = $1; chomp( $records{ $snoRNA }[ -1 ]{ exon } = <DATA> ); } else { chomp( $records{ $snoRNA }[ -1 ]{ exon } = $li +ne ); } chomp( $records{ $snoRNA }[ -1 ]{ GeneID } = <DATA> ); chomp( $records{ $snoRNA }[ -1 ]{ NM_ID } = <DATA> ); for( 1 .. $seqs ) { chomp( my $query = <DATA> ); scalar (<DATA>); chomp( my $sbjct = <DATA> ); push @{ $records{ $snoRNA }[ -1 ]{ seqs } }, { $query => $sbjc +t }; } chomp( $records{ $snoRNA }[ -1 ]{ gene_name } = <DATA> ); chomp( $records{ $snoRNA }[ -1 ]{ web_link } = <DATA> ); } pp \%records; __DATA__ snoRNA 25, 26 and 27 2 GI:142387131 NM_006299.3 Query 2 CCTGGAGTCGAGT 14 ||||||||||||| Sbjct 371 CCTGGAGTCGAGT 359 Homo sapiens zinc finger protein 193 (ZNF193), mRNA. http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do +pt=GenBank&RID=UDW41RSS01S&log%24=nuclalign&blast_rank=11&list_uids=1 +42387131 snoRNA 25, 26 and 27 1 NM_001005236.3 GI:256773198 Query 3 CTGGAGTCGAGTGTCT 18 |||||| ||||||||| Sbjct 168 CTGGAGACGAGTGTCT 153 Homo sapiens olfactory receptor, family 1, subfamily L, member 1 (OR1L +1), mRNA. http://www.ncbi.nlm.nih.gov/ snoRNA 25, 26 and 27 4 different hits 31 GI:153945877 NM_002458.1 Query 3 CTGGAGTCGAGTG 15 ||||||||||||| Sbjct 6818 CTGGAGTCGAGTG 6806 Query 3 CTGGAGTCGAGTG 15 ||||||||||||| Sbjct 8489 CTGGAGTCGAGTG 8477 Query 3 CTGGAGTCGAGTG 15 ||||||||||||| Sbjct 10589 CTGGAGTCGAGTG 10577 Query 3 CTGGAGTCGAGTG 15 ||||||||||||| Sbjct 12260 CTGGAGTCGAGTG 12248 Homo sapiens mucin 5B, oligomeric mucus/gel-forming (MUC5B), mRNA. http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&do +pt=GenBank&RID=UDW41RSS01S&log%24=nuclalign&blast_rank=9&list_uids=15 +3945877
In reply to Re^4: split a file into records and process it
by biohisham
in thread split a file into records and process it
by biohisham
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |