It looks like you're trying to catch the individual pairs from this part of the output:

40 50 60 70 80 90 HAHU TTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVN ... ..... . : ::: :.. ..: :. CG1674 MDSTLNIENVNDPTSIASDLSAENTKADLVS 10 20 30
If that's the case, then you need to write a Perl script to extract those sequences from the output. It looks like HAHU is the sample, and the other sequences are from the library. So maybe that means you want to capture the HAHU bits -- I'm not that clear.

Anyway, I've hacked up a bit of Perl that should help you get started -- it's all I have time for now, Have to attend to a sick Cygwin installation, make waffles for the family, attend a funeral (really) so ..

#!/usr/bin/perl -w use strict; while(<DATA>) { print "---------------\n"; if (/^(\s+\d{2,3})+/) { # Start of block print "Analyze:\n$_"; # Here I'm just grabbing individual lines from the # fasta output into variables. There's the sample # scale, the sample, the match (dots and colons), # the library and the library scale. my $samScale = $_; my $sample = <DATA>; my $match = <DATA>; my $library = <DATA>; my $libScale = <DATA>; # I'm using a regular expression to figure out how # how long the leading blanks are and how long the # trailing blanks are. my ( $endBlanks, $startBlanks ) = $match =~ /^((\s+).+?)\s+$/; print "Start at " . length($startBlanks); print ", end at " . length($endBlanks) . "\n"; # Since the regular expression grabbed the relevant # pieces of the strong but we just want the length, # we do that conversino here. my ( $start, $end ) = ( length($startBlanks), length($endBlanks) ); # Done .. print out the matching parts. print "Sample match is: " . substr($sample,$start, $end-$start) . "\n"; print "Library match is: " . substr($library,$start, $end-$start) . "\n"; } else { # Skip the parts that appear to be commentary. # Debug code, thuse commented out but left behind. # print "Skip:\n$_"; } } __DATA__ 40 50 60 70 80 90 HAHU TTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVN ... ..... . : ::: :.. ..: :. CG1674 MDSTLNIENVNDPTSIASDLSAENTKADLVS 10 20 30 100 110 120 130 140 HAHU FKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR .. . .. :. : :: : : : ::.: CG1674 LNEPNVNDQTSSASDLTAENTKADHDSLNKPKDFNNQILNIISDIDINIKAQEKITQLKE 40 50 60 70 80 90 >>CG11153-PA type=protein; loc=4:complement(821536..8223 (580 aa) initn: 43 init1: 43 opt: 69 Z-score: 84.3 bits: 23.5 E(): 1.3 Smith-Waterman score: 69; 45.455% identity (48.387% ungapped) in 33 a +a overlap (57-89:513-543) 30 40 50 60 70 80 HAHU EALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDL : ...:: : . :: :..:: : :: : CG1115 AEMRQLWCRTGGVSGGSGSLCADACPKGSGGSNSQVAVAAAAAVYHLQDM--ASSAASTA 490 500 510 520 530 540
When I run this I get the following matches:
--------------- Analyze: 40 50 60 70 80 90 Start at 37, end at 67 Sample match is: NAVAHVDDMPNALSALSDLHAHKLRVDPVN Library match is: DSTLNIENVNDPTSIASDLSAENTKADLVS --------------- --------------- Analyze: 100 110 120 130 140 Start at 7, end at 37 Sample match is: FKLLSHCLLVTLAAHLPAEFTPAVHASLDK Library match is: LNEPNVNDQTSSASDLTAENTKADHDSLNK --------------- --------------- --------------- --------------- --------------- --------------- --------------- Analyze: 30 40 50 60 70 80 Start at 37, end at 65 Sample match is: GHGKKVADALTNAVAHVDDMPNALSALS Library match is: GSNSQVAVAAAAAVYHLQDM--ASSAAS
Anyway, this is all a wild guess based on the output you've provided. There's obviously more to do .. you want to match up the first and second pieces, since I can see those two are part of the same string, but .. I don't pretend to know anything about biochemistry .. so I leave that up to you.

To learn Perl, I highly recommend you get a copy of Learning Perl and then Programming Perl, both excellent books from O'Reilly, available either from your local computer bookstore or over the web.

Perl may be a little difficult to learn, but it's an amzingly powerful tool once you get familiar with it. Good Luck!

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds


In reply to Re^2: Fasta Using Perl by talexb
in thread Fasta Using Perl by FarTech

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.