in reply to Complicated pattern match
In the spirit of TMTOWTDI, I thought I would take a different approach. The problems dr_jgbn presents could be used to do several things, though the most obvious is to look for exons that are included in one transcript and not another. Also, the concern that Abagail-II raises is valid, therefore, I think it is reasonable and desireable to use industry standard analysis tools to arrive at a solution. What follows is a fairly long description of how I solved that problem.
LOCUS AF418600 3389 bp mRNA linear PRI 30-DEC-2002
DEFINITION Homo sapiens ATP-binding cassette protein C13 (ABCC13) mRNA,
complete cds.
ACCESSION AF418600
VERSION AF418600.1 GI:19851883
KEYWORDS .
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 3389)
AUTHORS Yabuuchi,H., Takayanagi,S.-I., Yoshinaga,K., Taniguchi,N.,
Aburatani,H. and Ishikawa,T.
TITLE ABCC13, an unusual truncated ABC transporter, is highly expressed
in fetal human liver
JOURNAL Biochem. Biophys. Res. Commun. 299 (3), 410-417 (2002)
MEDLINE 22334951
PUBMED 12445816
REFERENCE 2 (bases 1 to 3389)
AUTHORS Yabuuchi,H., Takayanagi,S.-I. and Ishikawa,T.
TITLE Direct Submission
JOURNAL Submitted (07-SEP-2001) Biomolecular Engineering, Tokyo Institute
of Technology, 4259 Nagatsuta, Midori-ku, Yokohama, Kanagawa
226-8501, Japan
FEATURES Location/Qualifiers
source 1..3389
/organism="Homo sapiens"
/db_xref="taxon:9606"
/chromosome="21"
/map="21q11.2"
/tissue_type="placenta"
gene 1..3389
/gene="ABCC13"
CDS 294..1271
/gene="ABCC13"
/note="transporter"
/codon_start=1
/product="ATP-binding cassette protein C13"
/protein_id="AAL99903.1"
/db_xref="GI:19851884"
/translation="MLSSTQNAGGSYQRVRGALDTQKCSPEKSASFFSKVTYSWFSRV
ITLGYKRPLEREDLFELKESDSFCTACPIFEKQWRKEVLRNQERQKVKVSCYKEAHIK
KPSLLYALWNTFKSILIQVALFKVFADILSFTSPLIMKQIIIFCEHSSDFGWNGYGYA
VALLVVVFLQTLILQQYQRFNMLTSAKVKTAVNGLIYKKVSLATLCVYFLLDEGNILT
ATKVFTSMSLFNILRIPLFELPTVISAVVQTKISLGRLEDFLNTEELLPQSIETNYTG
DHAIGFTDASFSWDKTGMPVLKEALWLMFLSRPGFRIAFCKKTFSLAPS"
BASE COUNT 1087 a 639 c 698 g 965 t
ORIGIN
1 ttcccgtctc cctccacacc ttgcctcctc caggcccaac cccattccac cagcaccccg
61 ccccaggcgc actggctagg actggaaggc aaggactaag gtgaaagctg actgcccgca
121 ccgccccagg cgcactgggt aggactgaaa ggcagggact aaggtgacag ctgactctgc
181 cggggcggag tgcgtgtcct cccaccgtgc tggcgctgaa ctgactgtcc gctgccaagg
241 gaagtgacag ccgcagccgg gctctcagcc agcggccggg cgcccagcgg accatgctct
301 ccagtacgca gaacgcgggc ggctcctatc agcgggtccg cggggcgctt gatacacaga
361 aatgcagtcc agagaaaagt gcctcatttt tcagtaaagt gacatattcc tggtttagca
421 gagtaattac tttaggctat aagagacctt tggaaagaga ggatcttttt gaactaaagg
481 aaagtgattc cttctgcact gcgtgtccca tctttgaaaa acaatggaga aaggaagttt
541 taaggaatca agagaggcaa aaagtaaagg tatcttgtta taaagaggca catatcaaga
601 aaccatctct actctatgca ttgtggaaca cctttaaatc catcctgatt caagttgcct
661 tattcaaagt gtttgctgat attttgtcct tcactagccc actcataatg aagcaaatta
721 tcattttctg tgaacacagc tcagattttg gctggaatgg ctatggctat gcagtggcac
781 ttcttgttgt agtctttttg caaactctga ttcttcagca atatcaacgt tttaacatgc
841 tcacctcagc aaaagttaag acagctgtaa atggactgat ctacaaaaag gtgtccctgg
901 caaccttgtg tgtctatttc ttactggatg aaggaaacat tttaacagcc actaaagtgt
961 tcacatcgat gtctttgttt aatattttaa ggattcctct gtttgagtta ccaaccgtga
1021 tctcagctgt ggtccagaca aagatatccc tgggccgttt ggaagacttt ctcaacactg
1081 aggagcttct tcctcaaagt attgaaacga actatacagg agatcatgct attgggttta
1141 cagatgcttc tttctcctgg gataaaacag gaatgccagt tctaaaagag gctctgtggc
1201 ttatgtttct cagcaggcct ggattcagaa ttgcattttg caagaaaaca ttctctttgg
1261 ctccatcatg aagaaagagt tttatgagca agtattggaa gcctgtgctc tccttccaga
1321 tttggagcag ttgccaaagg gagatcaaac tgagattgga gaaagagtaa ggaaacggct
1381 gtgaatataa gtgggggcca gcagcatcga gtaagcctgg ccagagctgt ctacagtggg
1441 gccgacgtct acctcctgga tgatcccttg tctgctattg atgttcatgt tggaaagcag
1501 ctttttgaaa aagtgatagg gtccttgggc cttttgaaaa accgggtagc catagtttca
1561 tttatgcttt cttgtggggc aggggatata tctagcactg tgcctaaaat tccatttatt
1621 ttgtttattt gttgaagcaa agtagtcaca tttcacatga aaaaaatggg atgctcagat
1681 tgcaatttca caggcaaatg tcaatgggta atttttgaag ttttacttac aagtttactc
1741 tggtgacaat ttatgtgatt gttatctctt catctaatac catatcctaa ggggatgtat
1801 gtgccataat gaattgttgt tgactcaagt gatttttaca atggtatttt ggtgcacacc
1861 aaaatattaa taaaatttta tattattaac atagcaattt gacattatta tcacagcaat
1921 catagtgatt cagctgtttt gttaaagggc caagaagatg aaaattagtg tttaccagta
1981 catgccatat ttcacagggt aaattagttc aatccctatt tcacacataa aggagctcag
2041 ggctcacaat gacttgatat aggttgctca acttaagaac aatagagatt tttaaaagta
2101 tgtctgactc caaagccaca ttcttcataa tatgtgttta tggaattatt aatatagaga
2161 actttgaaaa tatatgtata aaattataaa atatatgtct cagtaaaaat aaacaaacat
2221 ctttatatat tgacataaac ccaagattaa aaatgtcaca cttaattgta tgtatatgta
2281 taaaataata tgaatgattt aaaaataagg actatgaatt atgacttgga ttaaacactt
2341 tttttgtaat ggtactgtat atttaaaaat aagcaagaat gacatatatg aaattgtttt
2401 aaatttgcaa ttaaattgat tcatttctaa tgaacgaaaa aatacccgaa ctaaagtgtt
2461 aaaaatcaca tttagaaact attttctcat ctagcagcat gtcattattc cttgtaaaat
2521 acatagatgc tgccttaaac tggcacctac caaaaatcat actctgaaat agtaaaggaa
2581 aaaagtcatt gtagcttttt ttaaaaagcc atttttcgta atctctgaga taaaacactt
2641 tgtttttcag actcatattt tagtgacaca caatctcaca cttctgccac aaatgaatct
2701 tattgttgta atgaaaagtg gcagaatagc tcaaatggga atataccagg agctgctgtg
2761 taaaaccaaa aatctcacta atttgcacca agtcatcagt gaacaagaaa aagctcatgc
2821 tctaaagcaa gtcagtgcaa tcaactccag aactagacca aaagacaaaa tcctggagca
2881 aaaacatagt gggatatgag tccccttctg agagtcatgc aaggatttct aggcactgtt
2941 aagaacccac gtagaaagaa gatgttaagt gacggtagaa ccagttttca gggttgcctg
3001 attttttgaa gcagcacttc acagaagtcg aggtgaccag gattgacatg caagttctaa
3061 gagacaggcc ctcattggat caaggaaaac agctctcaat gaaaaaagaa aagatccctg
3121 ttggtgggtt gaagttctcc atcattctgc agtacctcca agcctttggc tggctctggg
3181 tgtggctgac tgtggtcact tacttagggc agaatttggt gagcattgga cagaatctat
3241 ggcttagtgc atgggcaaaa gaagcaaaaa acatgaatga gttcacagaa tggaagcaaa
3301 taagaagcaa taaacttaat atctatggat tattgggatt aataaaaggt aaaagaaaat
3361 gtaaaaaaaa aaaaaaaaaa aaaaaaaaa
//
and
LOCUS ABCC13 1073 bp mRNA linear PRI 23-DEC-2002
DEFINITION Homo sapiens ATP-binding cassette, sub-family C (CFTR/MRP), member
13 (ABCC13), transcript variant 3, mRNA.
ACCESSION NM_172026
VERSION NM_172026.1 GI:25952076
KEYWORDS .
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 1073)
AUTHORS Allikmets,R., Gerrard,B., Hutchinson,A. and Dean,M.
TITLE Characterization of the human ABC superfamily: isolation and
mapping of 21 new genes using the expressed sequence tags database
JOURNAL Hum. Mol. Genet. 5 (10), 1649-1655 (1996)
MEDLINE 97049974
PUBMED 8894702
REFERENCE 2 (bases 1 to 1073)
AUTHORS Yabuuchi,H., Takayanagi,S., Yoshinaga,K., Taniguchi,N.,
Aburatani,H. and Ishikawa,T.
TITLE ABCC13, an unusual truncated ABC transporter, is highly expressed
in fetal human liver
JOURNAL Biochem. Biophys. Res. Commun. 299 (3), 410-417 (2002)
MEDLINE 22334951
PUBMED 12445816
COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from AF518320.1 and AF418600.1.
Summary: The protein encoded by this gene is a member of the
superfamily of ATP-binding cassette (ABC) transporters. ABC
proteins transport various molecules across extra- and
intra-cellular membranes. ABC genes are divided into seven distinct
subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, and White). This
ABC transporter is a member of the MRP subfamily which is involved
in multi-drug resistance. Alternative splicing of this gene results
in multiple transcript variants; however, not all variants have
been fully described.
Transcript Variant: This variant (3) lacks a segment in the coding
region, compared to transcript variant 1. This results in a longer
protein (isoform c) compared to isoform a. This variant (3) has
been alternatively referred to as transcript variant C.
COMPLETENESS: complete on the 3' end.
FEATURES Location/Qualifiers
source 1..1073
/organism="Homo sapiens"
/db_xref="taxon:9606"
/chromosome="21"
/map="21q11.1"
gene 1..1073
/gene="ABCC13"
/note="synonyms: PRED6, C21orf73"
/db_xref="LocusID:150000"
CDS 294..1031
/gene="ABCC13"
/note="ATP-binding cassette transporter C13; multidrug
resistance associated protein-like"
/codon_start=1
/product="ATP-binding cassette protein C13 isoform c"
/protein_id="NP_742023.1"
/db_xref="GI:25952077"
/db_xref="LocusID:150000"
/translation="MLSSTQNAGRLIHRVITLGYKRPLEREDLFELKESDSFCTACPI
FEKQWRKEVLRNQERQKVKVSCYKEAHIKKPSLLYALWNTFKSILIQVALFKVFADIL
SFTSPLIMKQIIIFCEHSSDFGWNGYGYAVALLVVVFLQTLILQQYQRFNMLTSAKVK
TAVNGLIYKKALLLSNVSRQKFSTGEIINLMSATHGLDSKPQSPLVCPFSNPNGRISP
LARAGSSSVSRGGSPCVCYTNKCFSCN"
misc_feature 426..869
/gene="ABCC13"
/note="SunT; Region: ABC-type bacteriocin/lantibiotic
exporters, contain an N-terminal double-glycine peptidase
domain Defense mechanisms"
/db_xref="CDD:COG2274"
misc_feature 516..869
/gene="ABCC13"
/note="MdlB; Region: ABC-type multidrug transport system,
ATPase and permease components Defense mechanisms"
/db_xref="CDD:COG1132"
misc_feature 558..869
/gene="ABCC13"
/note="ABC_membrane; Region: ABC transporter transmembrane
region. This family represents a unit of six transmembrane
helices. Many members of the ABC transporter family
(pfam00005) have two such regions"
/db_xref="CDD:pfam00664"
variation 889
/gene="ABCC13"
/allele="G"
/allele="A"
/db_xref="dbSNP:2822558"
polyA_signal 1032..1037
/gene="ABCC13"
polyA_site 1052
/gene="ABCC13"
BASE COUNT 306 a 261 c 238 g 268 t
ORIGIN
1 ttcccgtctc cctccacacc ttgcctcctc caggcccaac cccattccac cagcaccccg
61 ccccaggcgc actggctagg actggaaggc aaggactaag gtgaaagctg actgcccgca
121 ccgccccagg cgcactgggt aggactgaaa ggcagggact aaggtgacag ctgactctgc
181 cggggcggag tgcgtgtcct cccaccgtgc tggcgctgaa ctgactgtcc gctgccaagg
241 gaagtgacag ccgcagccgg gctctcagcc agcggccagg cgccccgcgg accatgctct
301 ccagtacgca gaacgcgggg cgcttgatac acagagtaat tactttaggc tataagagac
361 ctttggaaag agaggatctt tttgaactaa aggaaagtga ttccttctgc actgcgtgtc
421 ccatctttga aaaacaatgg agaaaggaag ttttaaggaa tcaagagagg caaaaagtaa
481 aggtatcttg ttataaagag gcacatatca agaaaccatc tctactctat gcattgtgga
541 acacctttaa atccatcctg attcaagttg ccttattcaa agtgtttgct gatattttgt
601 ccttcactag cccactcata atgaagcaaa ttatcatttt ctgtgaacac agctcagatt
661 ttggctggaa tggctatggc tatgcagtgg cacttcttgt tgtagtcttt ttgcaaactc
721 tgattcttca gcaatatcaa cgttttaaca tgctcacctc agcaaaagtt aagacagctg
781 taaatggact gatctacaaa aaggccttac ttttatcaaa tgtttctcga caaaagtttt
841 ccactgggga aattattaac ttgatgtcag caactcatgg acttgacagc aaacctcaat
901 ctcctctggt ctgccccttt tcaaatccta atggccgtat atctcctttg gcaagagctg
961 ggtccagcag tgttagcagg ggtggcagtc cttgtgtttg ttataccaat aaatgcttta
1021 gctgcaacta aaataaaaaa gttaaaggta agaaaaaaaa aaaaaaaaaa aaa
//
Using these as starting sequences, I wrote a script that does a bl2seq (blast two sequences against each other), then look at the results, determine what is missing with respect to each sequence. The bl2seq output is shown here:
Query= a.seq
(3389 letters)
>b.seq
Length = 1073
Score = 938 bits (473), Expect = 0.0
Identities = 473/473 (100%)
Strand = Plus / Plus
Query: 419 cagagtaattactttaggctataagagacctttggaaagagaggatctttttgaactaaa 478
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 332 cagagtaattactttaggctataagagacctttggaaagagaggatctttttgaactaaa 391
Query: 479 ggaaagtgattccttctgcactgcgtgtcccatctttgaaaaacaatggagaaaggaagt 538
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 392 ggaaagtgattccttctgcactgcgtgtcccatctttgaaaaacaatggagaaaggaagt 451
Query: 539 tttaaggaatcaagagaggcaaaaagtaaaggtatcttgttataaagaggcacatatcaa 598
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 452 tttaaggaatcaagagaggcaaaaagtaaaggtatcttgttataaagaggcacatatcaa 511
Query: 599 gaaaccatctctactctatgcattgtggaacacctttaaatccatcctgattcaagttgc 658
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 512 gaaaccatctctactctatgcattgtggaacacctttaaatccatcctgattcaagttgc 571
Query: 659 cttattcaaagtgtttgctgatattttgtccttcactagcccactcataatgaagcaaat 718
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 572 cttattcaaagtgtttgctgatattttgtccttcactagcccactcataatgaagcaaat 631
Query: 719 tatcattttctgtgaacacagctcagattttggctggaatggctatggctatgcagtggc 778
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 632 tatcattttctgtgaacacagctcagattttggctggaatggctatggctatgcagtggc 691
Query: 779 acttcttgttgtagtctttttgcaaactctgattcttcagcaatatcaacgttttaacat 838
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 692 acttcttgttgtagtctttttgcaaactctgattcttcagcaatatcaacgttttaacat 751
Query: 839 gctcacctcagcaaaagttaagacagctgtaaatggactgatctacaaaaagg 891
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 752 gctcacctcagcaaaagttaagacagctgtaaatggactgatctacaaaaagg 804
Score = 617 bits (311), Expect = e-180
Identities = 317/319 (99%)
Strand = Plus / Plus
Query: 1 ttcccgtctccctccacaccttgcctcctccaggcccaaccccattccaccagcaccccg 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 ttcccgtctccctccacaccttgcctcctccaggcccaaccccattccaccagcaccccg 60
Query: 61 ccccaggcgcactggctaggactggaaggcaaggactaaggtgaaagctgactgcccgca 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 ccccaggcgcactggctaggactggaaggcaaggactaaggtgaaagctgactgcccgca 120
Query: 121 ccgccccaggcgcactgggtaggactgaaaggcagggactaaggtgacagctgactctgc 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 ccgccccaggcgcactgggtaggactgaaaggcagggactaaggtgacagctgactctgc 180
Query: 181 cggggcggagtgcgtgtcctcccaccgtgctggcgctgaactgactgtccgctgccaagg 240
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 cggggcggagtgcgtgtcctcccaccgtgctggcgctgaactgactgtccgctgccaagg 240
Query: 241 gaagtgacagccgcagccgggctctcagccagcggccgggcgcccagcggaccatgctct 300
||||||||||||||||||||||||||||||||||||| ||||||| ||||||||||||||
Sbjct: 241 gaagtgacagccgcagccgggctctcagccagcggccaggcgccccgcggaccatgctct 300
Query: 301 ccagtacgcagaacgcggg 319
|||||||||||||||||||
Sbjct: 301 ccagtacgcagaacgcggg 319
Score = 44.1 bits (22), Expect = 2e-07
Identities = 22/22 (100%)
Strand = Plus / Plus
Query: 339 cgcggggcgcttgatacacaga 360
||||||||||||||||||||||
Sbjct: 314 cgcggggcgcttgatacacaga 335
Lambda K H
1.37 0.711 1.31
Gapped
Lambda K H
1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 6
Number of Sequences: 0
Number of extensions: 6
Number of successful extensions: 6
Number of sequences better than 1.0e-04: 1
Number of HSP's better than 0.0 without gapping: 1
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 4
length of query: 3389
length of database: 1073
effective HSP length: 10
effective length of query: 3379
effective length of database: 1063
effective search space: 3591877
effective search space used: 3591877
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 18 (36.2 bits)
Finally, the code is below. It uses Bio::Tools::BPbl2seq, which is part of BioPerl. Note that the code could do several more things, like using BioPerl tools to do the blast on a remote server and turning the sequences into Bio::Seq objects, so that object methods could be used. For this example, that didn't seem necessary. There are also analysis options that one might want to consider; I'll address those after the output.
So, ignoring the two end pieces that are different, there appear to be two short pieces of sequence A that contains sequence that is not contained in sequence B. There are several caveats that go along with this data. First, because BLAST filters out simple repeats, it is possible that some edges are missing. Also, there are frequently overlaps around the edges of BLAST hits, since the junctions around exons frequently look similar. So a more detailed analysis of the found exons is usually required. At a minimum, that analysis should including looking for likely splice sites. If the genome is available, the found exons could be blasted against the genome and the up and down stream regions could be cropped out and run through a splice finder. Along with the splice sites, it is frequently useful to look at the resulting translation into protein to see how the predicted splicing would affect the protein (ie, does it result in early termination, does it include or exclude a potentially important motif, etc).#!/usr/bin/perl -w use strict; use Bio::Tools::BPbl2seq; my $AFILE = 'a.seq'; my $BFILE = 'b.seq'; my $OUTFILE = 'bl2seq.out'; system('blast/bl2seq', '-i', $AFILE, '-j', $BFILE, '-o', $OUTFILE, '-p', 'blastn', '-e', '1e-4'); my $report = Bio::Tools::BPbl2seq->new(-file => $OUTFILE, -queryname => 'a.seq'); my (@Aranges,@Branges); while (my $hsp = $report->next_feature) { push @Aranges, [$hsp->query->start, $hsp->query->end]; push @Branges, [$hsp->sbjct->start, $hsp->sbjct->end]; } my $Arange_ref = \@Aranges; my $Brange_ref = \@Branges; my $sortedA = sort_array($Arange_ref); my $sortedB = sort_array($Brange_ref); my $Aseq = get_seq($AFILE); my $Bseq = get_seq($BFILE); my $Agaps = find_gaps(length $Aseq, $sortedA); my $Bgaps = find_gaps(length $Bseq, $sortedB); print_gaps('A',$Aseq,$Agaps); print_gaps('B',$Bseq,$Bgaps); #------------- sub sort_array { my $array_ref = shift; my @array = @{$array_ref}; for (my $i = 0; $i<((scalar @array)-1); $i++) { if ($array[$i][0] > $array[$i+1][0]) { ($array[$i], $array[$i+1]) = ($array[$i+1], $array[$i]); } } my $return_ref = \@array; return $return_ref; } #------------- sub find_gaps { my $len = shift; my $array_ref = shift; my @array = @{$array_ref}; my @gaps; if ($array[0][0] > 1) { push @gaps, (1, ($array[0]->[0]-1)); } for (my $i = 1; $i<(scalar @array); $i++) { if ($array[$i-1][1] < $array[$i][0]) { push @gaps, [ ($array[$i-1][1]+1), ($array[$i][0]-1) ]; } } if ($array[-1][1] < $len) { push @gaps, [ ($array[-1][1]+1), $len]; } my $return_ref = \@gaps; return $return_ref; } #------------ sub get_seq { my $filename = shift; my $seq = ''; open FILE, $filename or die "couldn't open $filename: $!\n"; while (<FILE>) { chomp; if (/^>/) { next; } $seq .= $_; } close FILE; $seq; } #------------ sub print_gaps { my $seqname = shift; my $seqin = shift; my $array_ref = shift; my @array = @{$array_ref}; if (scalar @array > 0) { print "$seqname gaps:\n"; foreach my $gap (@array) { my $start = ${$gap}[0]; my $end = ${$gap}[1]; my $seq = substr ($seqin, $start, ($end-$start) ); print "$start - $end:$seq\n"; } } } <code> Which results in this output: <code> A gaps: 320 - 338:ggctcctatcagcgggtc 361 - 418:atgcagtccagagaaaagtgcctcatttttcagtaaagtgacatattcctggtttag 892 - 3389:gtccctggcaaccttgtgtgtctatttcttactggatgaaggaaacattttaacagcca +ctaaagtgtt cacatcgatgtctttgtttaatattttaaggattcctctgtttgagttaccaaccgtgatctcagctgtg +gtccagacaa agatatccctgggccgtttggaagactttctcaacactgaggagcttcttcctcaaagtattgaaacgaa +ctatacagga gatcatgctattgggtttacagatgcttctttctcctgggataaaacaggaatgccagttctaaaagagg +ctctgtggct tatgtttctcagcaggcctggattcagaattgcattttgcaagaaaacattctctttggctccatcatga +agaaagagtt ttatgagcaagtattggaagcctgtgctctccttccagatttggagcagttgccaaagggagatcaaact +gagattggag aaagagtaaggaaacggctgtgaatataagtgggggccagcagcatcgagtaagcctggccagagctgtc +tacagtgggg ccgacgtctacctcctggatgatcccttgtctgctattgatgttcatgttggaaagcagctttttgaaaa +agtgataggg tccttgggccttttgaaaaaccgggtagccatagtttcatttatgctttcttgtggggcaggggatatat +ctagcactgt gcctaaaattccatttattttgtttatttgttgaagcaaagtagtcacatttcacatgaaaaaaatggga +tgctcagatt gcaatttcacaggcaaatgtcaatgggtaatttttgaagttttacttacaagtttactctggtgacaatt +tatgtgattg ttatctcttcatctaataccatatcctaaggggatgtatgtgccataatgaattgttgttgactcaagtg +atttttacaa tggtattttggtgcacaccaaaatattaataaaattttatattattaacatagcaatttgacattattat +cacagcaatc atagtgattcagctgttttgttaaagggccaagaagatgaaaattagtgtttaccagtacatgccatatt +tcacagggta aattagttcaatccctatttcacacataaaggagctcagggctcacaatgacttgatataggttgctcaa +cttaagaaca atagagatttttaaaagtatgtctgactccaaagccacattcttcataatatgtgtttatggaattatta +atatagagaa ctttgaaaatatatgtataaaattataaaatatatgtctcagtaaaaataaacaaacatctttatatatt +gacataaacc caagattaaaaatgtcacacttaattgtatgtatatgtataaaataatatgaatgatttaaaaataagga +ctatgaatta tgacttggattaaacactttttttgtaatggtactgtatatttaaaaataagcaagaatgacatatatga +aattgtttta aatttgcaattaaattgattcatttctaatgaacgaaaaaatacccgaactaaagtgttaaaaatcacat +ttagaaacta ttttctcatctagcagcatgtcattattccttgtaaaatacatagatgctgccttaaactggcacctacc +aaaaatcata ctctgaaatagtaaaggaaaaaagtcattgtagctttttttaaaaagccatttttcgtaatctctgagat +aaaacacttt gtttttcagactcatattttagtgacacacaatctcacacttctgccacaaatgaatcttattgttgtaa +tgaaaagtgg cagaatagctcaaatgggaatataccaggagctgctgtgtaaaaccaaaaatctcactaatttgcaccaa +gtcatcagtg aacaagaaaaagctcatgctctaaagcaagtcagtgcaatcaactccagaactagaccaaaagacaaaat +cctggagcaa aaacatagtgggatatgagtccccttctgagagtcatgcaaggatttctaggcactgttaagaacccacg +tagaaagaag atgttaagtgacggtagaaccagttttcagggttgcctgattttttgaagcagcacttcacagaagtcga +ggtgaccagg attgacatgcaagttctaagagacaggccctcattggatcaaggaaaacagctctcaatgaaaaaagaaa +agatccctgt tggtgggttgaagttctccatcattctgcagtacctccaagcctttggctggctctgggtgtggctgact +gtggtcactt acttagggcagaatttggtgagcattggacagaatctatggcttagtgcatgggcaaaagaagcaaaaaa +catgaatgag ttcacagaatggaagcaaataagaagcaataaacttaatatctatggattattgggattaataaaaggta +aaagaaaatg taaaaaaaaaaaaaaaaaaaaaaaaaaa B gaps: 805 - 1073:cttacttttatcaaatgtttctcgacaaaagttttccactggggaaattattaacttga +tgtcagcaac tcatggacttgacagcaaacctcaatctcctctggtctgccccttttcaaatcctaatggccgtatatct +cctttggcaa gagctgggtccagcagtgttagcaggggtggcagtccttgtgtttgttataccaataaatgctttagctg +caactaaaat aaaaaagttaaaggtaagaaaaaaaaaaaaaaaaaaaaa
Scott
Project coordinator of the Generic Model Organism Database Project
|
|---|