comment on

It looks like you're trying to catch the individual pairs from this part of the output:

        40        50        60        70        80        90       
HAHU   TTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVN
                                     ... ..... .  :  ::: :.. ..: :.
CG1674                              MDSTLNIENVNDPTSIASDLSAENTKADLVS
                                            10        20        30
[download]

If that's the case, then you need to write a Perl script to extract those sequences from the output. It looks like HAHU is the sample, and the other sequences are from the library. So maybe that means you want to capture the HAHU bits -- I'm not that clear.

Anyway, I've hacked up a bit of Perl that should help you get started -- it's all I have time for now, Have to attend to a sick Cygwin installation, make waffles for the family, attend a funeral (really) so ..

#!/usr/bin/perl -w

use strict;

while(<DATA>) {
  print "---------------\n";
  if (/^(\s+\d{2,3})+/) { #  Start of block
    print "Analyze:\n$_";

    #  Here I'm just grabbing individual lines from the
    #  fasta output into variables. There's the sample
    #  scale, the sample, the match (dots and colons),
    #  the library and the library scale.

    my $samScale = $_;
    my $sample = <DATA>;
    my $match = <DATA>;
    my $library = <DATA>;
    my $libScale = <DATA>;

    #  I'm using a regular expression to figure out how
    #  how long the leading blanks are and how long the
    #  trailing blanks are.

    my ( $endBlanks, $startBlanks ) =
      $match =~ /^((\s+).+?)\s+$/;
    print "Start at " . length($startBlanks);
    print ", end at " . length($endBlanks) . "\n";

    #  Since the regular expression grabbed the relevant 
    #  pieces of the strong but we just want the length, 
    #  we do that conversino here.

    my ( $start, $end ) =
      ( length($startBlanks), length($endBlanks) );

    #  Done .. print out the matching parts.

    print "Sample match is: " .
      substr($sample,$start, $end-$start) . "\n";
    print "Library match is: " .
      substr($library,$start, $end-$start) . "\n";
  } else {

    #  Skip the parts that appear to be commentary.
    #  Debug code, thuse commented out but left behind.

    # print "Skip:\n$_";
  }
}

__DATA__
        40        50        60        70        80        90       
HAHU   TTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVN
                                     ... ..... .  :  ::: :.. ..: :.
CG1674                              MDSTLNIENVNDPTSIASDLSAENTKADLVS
                                            10        20        30 

       100       110       120       130       140                 
HAHU   FKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR                
       ..  .    .. :. : :: : : : ::.:                              
CG1674 LNEPNVNDQTSSASDLTAENTKADHDSLNKPKDFNNQILNIISDIDINIKAQEKITQLKE
              40        50        60        70        80        90 

>>CG11153-PA type=protein; loc=4:complement(821536..8223  (580 aa)
 initn:  43 init1:  43 opt:  69  Z-score: 84.3  bits: 23.5 E():  1.3
Smith-Waterman score: 69;  45.455% identity (48.387% ungapped) in 33 a
+a overlap (57-89:513-543)

         30        40        50        60        70        80      
HAHU   EALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDL
                                     : ...:: : . :: :..::  : :: :  
CG1115 AEMRQLWCRTGGVSGGSGSLCADACPKGSGGSNSQVAVAAAAAVYHLQDM--ASSAASTA
            490       500       510       520       530         540
[download]

When I run this I get the following matches:

---------------
Analyze:
        40        50        60        70        80        90       
Start at 37, end at 67
Sample match is: NAVAHVDDMPNALSALSDLHAHKLRVDPVN
Library match is: DSTLNIENVNDPTSIASDLSAENTKADLVS
---------------
---------------
Analyze:
       100       110       120       130       140                 
Start at 7, end at 37
Sample match is: FKLLSHCLLVTLAAHLPAEFTPAVHASLDK
Library match is: LNEPNVNDQTSSASDLTAENTKADHDSLNK
---------------
---------------
---------------
---------------
---------------
---------------
---------------
Analyze:
         30        40        50        60        70        80      
Start at 37, end at 65
Sample match is: GHGKKVADALTNAVAHVDDMPNALSALS
Library match is: GSNSQVAVAAAAAVYHLQDM--ASSAAS
[download]

Anyway, this is all a wild guess based on the output you've provided. There's obviously more to do .. you want to match up the first and second pieces, since I can see those two are part of the same string, but .. I don't pretend to know anything about biochemistry .. so I leave that up to you.

To learn Perl, I highly recommend you get a copy of Learning Perl and then Programming Perl, both excellent books from O'Reilly, available either from your local computer bookstore or over the web.

Perl may be a little difficult to learn, but it's an amzingly powerful tool once you get familiar with it. Good Luck!

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

In reply to Re^2: Fasta Using Perl by talexb
in thread Fasta Using Perl by FarTech

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.