40 50 60 70 80 90
HAHU TTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVN
... ..... . : ::: :.. ..: :.
CG1674 MDSTLNIENVNDPTSIASDLSAENTKADLVS
10 20 30
If that's the case, then you need to write a Perl script to extract those sequences from the output. It looks like HAHU is the sample, and the other sequences are from the library. So maybe that means you want to capture the HAHU bits -- I'm not that clear.
Anyway, I've hacked up a bit of Perl that should help you get started -- it's all I have time for now, Have to attend to a sick Cygwin installation, make waffles for the family, attend a funeral (really) so ..
#!/usr/bin/perl -w
use strict;
while(<DATA>) {
print "---------------\n";
if (/^(\s+\d{2,3})+/) { # Start of block
print "Analyze:\n$_";
# Here I'm just grabbing individual lines from the
# fasta output into variables. There's the sample
# scale, the sample, the match (dots and colons),
# the library and the library scale.
my $samScale = $_;
my $sample = <DATA>;
my $match = <DATA>;
my $library = <DATA>;
my $libScale = <DATA>;
# I'm using a regular expression to figure out how
# how long the leading blanks are and how long the
# trailing blanks are.
my ( $endBlanks, $startBlanks ) =
$match =~ /^((\s+).+?)\s+$/;
print "Start at " . length($startBlanks);
print ", end at " . length($endBlanks) . "\n";
# Since the regular expression grabbed the relevant
# pieces of the strong but we just want the length,
# we do that conversino here.
my ( $start, $end ) =
( length($startBlanks), length($endBlanks) );
# Done .. print out the matching parts.
print "Sample match is: " .
substr($sample,$start, $end-$start) . "\n";
print "Library match is: " .
substr($library,$start, $end-$start) . "\n";
} else {
# Skip the parts that appear to be commentary.
# Debug code, thuse commented out but left behind.
# print "Skip:\n$_";
}
}
__DATA__
40 50 60 70 80 90
HAHU TTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVN
... ..... . : ::: :.. ..: :.
CG1674 MDSTLNIENVNDPTSIASDLSAENTKADLVS
10 20 30
100 110 120 130 140
HAHU FKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
.. . .. :. : :: : : : ::.:
CG1674 LNEPNVNDQTSSASDLTAENTKADHDSLNKPKDFNNQILNIISDIDINIKAQEKITQLKE
40 50 60 70 80 90
>>CG11153-PA type=protein; loc=4:complement(821536..8223 (580 aa)
initn: 43 init1: 43 opt: 69 Z-score: 84.3 bits: 23.5 E(): 1.3
Smith-Waterman score: 69; 45.455% identity (48.387% ungapped) in 33 a
+a overlap (57-89:513-543)
30 40 50 60 70 80
HAHU EALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDL
: ...:: : . :: :..:: : :: :
CG1115 AEMRQLWCRTGGVSGGSGSLCADACPKGSGGSNSQVAVAAAAAVYHLQDM--ASSAASTA
490 500 510 520 530 540
When I run this I get the following matches:
---------------
Analyze:
40 50 60 70 80 90
Start at 37, end at 67
Sample match is: NAVAHVDDMPNALSALSDLHAHKLRVDPVN
Library match is: DSTLNIENVNDPTSIASDLSAENTKADLVS
---------------
---------------
Analyze:
100 110 120 130 140
Start at 7, end at 37
Sample match is: FKLLSHCLLVTLAAHLPAEFTPAVHASLDK
Library match is: LNEPNVNDQTSSASDLTAENTKADHDSLNK
---------------
---------------
---------------
---------------
---------------
---------------
---------------
Analyze:
30 40 50 60 70 80
Start at 37, end at 65
Sample match is: GHGKKVADALTNAVAHVDDMPNALSALS
Library match is: GSNSQVAVAAAAAVYHLQDM--ASSAAS
Anyway, this is all a wild guess based on the output you've provided. There's obviously more to do .. you want to match up the first and second pieces, since I can see those two are part of the same string, but .. I don't pretend to know anything about biochemistry .. so I leave that up to you.
To learn Perl, I highly recommend you get a copy of Learning Perl and then Programming Perl, both excellent books from O'Reilly, available either from your local computer bookstore or over the web.
Perl may be a little difficult to learn, but it's an amzingly powerful tool once you get familiar with it. Good Luck!
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
|