paola82 has asked for the wisdom of the Perl Monks concerning the following question:

I have a problem with bioperl sequence alignment files..but It can be reduced to a problem of how to manipulate string. I have this type of file...I'll paste below a pice of it

1 + 50 Sequence/23-178 NDPRVAAYGE VDELNSWVGY TKSLINSHTQ VLSNELEEIQ QLLF +DCGHDL 2zhz:A/1-148 DDARIAAIGD VDELNSQIGV L--LAEPLPD DVRAALSAIQ HDLF +DLGGEL 51 + 100 Sequence/23-178 ATPADDERHS FKFKQEQPTV WLEEKIDNYT QVVPAVKKHI LPGG +TQLASA 2zhz:A/1-148 CIPGHAAITD AHLARLDG-- WLA----HYN GQLPPLEEFI LPGG +ARGAAL

if I want to identify in that sequence which letter is at position 15 for example of "Sequence" and at the same position of the 2zhz" how can I count, walk...trough the sequence? Could anyone help me...I know I must use bioperl mailing list, but I've tried...actually noone replied and there aren't bioperl site as good as this web site. sorry also for my english...I'm trying to learn at the same time..bioinformatic, english and perl as well :-)

Replies are listed 'Best First'.
Re: manipulating string
by arun_kom (Monk) on Sep 01, 2009 at 13:23 UTC
    The method slice() in Bio::SimpleAlign probably does what you need. Check out the bioperl tutorial.
Re: manipulating string
by Utilitarian (Vicar) on Sep 01, 2009 at 13:20 UTC
    Knowing nothing about Bioperl, you could take the following approach
    • Iterate through the file while (<$FILE_HANDLE>){
    • split each line on space. @record=split(/ /,$_);
    • if it consists of two integers ignore
    • if not join the record elements from 1 to $#record join ('',@record[1..$#record])
    • concatenate the resulting string onto $hash{$record[0]}
    • use substr to find the character at a specific index in each sequence for my $sequence (keys (%hash))
    See how far that takes you along the path and let us know if you're stuck, no need to apologise we like to help.
Re: manipulating string
by biohisham (Priest) on Sep 01, 2009 at 14:33 UTC
    In addition to Utilitarian's advice I would only add, take a look at the tutorials for the string manipulation functions substr, index, rindex, the index and rindex allow you to get the first and last position of a particular amino acid in your sequence, using these functions is direct forward.

    Here are the links Index, rindex, substr, and In Italiano.

    Have a happy programming experience :)


    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.