alicia has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I am a novice in Perl and need some help on a complicated question. Ihave to open a BlAST results file in Perl and then use regular expressions to parse out the query, BESTHIT, E-value, and the identities. I have to parse them out and then print it out to a file. BESTHIT is the one with the largest E-value Is there a subroutine that will help me with this??? I found some information on using BioPerl but I need to do the question without BioPerl. Thank you for any assistance.

Replies are listed 'Best First'.
Re: Subroutine to parse BLAST
by ww (Archbishop) on Dec 06, 2010 at 02:31 UTC

    alicia:
    It's just possible that some of us are unfamiliar with BIAST, BESTHIT, E-Value and uncertain precisely what "the identies" means in this context. Oh, /me fits that category. It's also possible someone else is expert on all the named whatcha'ma'call'its, and thus able to provide assistance without you supplying the prerequisites for a good question here.

    But it's a good idea to satisfy those anyway. They can be found at On asking for help and How do I post a question effectively?. Briefly, they include the code you've written, a data sample, and an unambiguous explanation of why your results don't satisfy you.

    Your SoPW doesn't tell us whether you know how to open and read a file(s?); how to assign content to variables; or how to write a regex. For the first, see perldoc open and any number of nodes here dealing with read, while, <> which can be found using Super Search. Generally, that same Super Search will offer examples of assignments to $vars. And for regex help, start with perldoc perlre and perldoc perlretut and the splendid tuts here at the Monastery.

    The fact you found BioPerl is commendable, but you're going to need to do more than that.. and, in any case, your statement that you "need to do the question without BioPerl" raises other questions; first, "Is this homework or the like?" and second, "Why?"

Re: Subroutine to parse BLAST
by biohisham (Priest) on Dec 06, 2010 at 06:02 UTC
    Unless you show how your BLAST report looks like (Tabular, line-wise...etc) by enclosing a subset of the report in here we are not going to be able to 'guide' you best. While BioPerl can be a way to go, Boulder::Blast can be another option.

    Whether or not to use regular expressions can not be absolutely ruled in/out because sometimes you require to combine the parsing abilities of the module you use with the prowess of regular expressions when parsing such sequence or blast objects. So without data we're just punching in the dark.

    NOTE: You seem to be using a library 'BeginPerlBioinfo', where is that coming from, did you read its documentation already?
    also you need to read Markup in the Monastery and Perl Monks Approved HTML tags.


    Excellence is an Endeavor of Persistence. A Year-Old Monk :D .
Re: Subroutine to parse BLAST
by tospo (Hermit) on Dec 06, 2010 at 09:38 UTC
    Why on earth would you want to do that without BioPerl? Parsing these reports is not trivial for a beginner (although the tabular output is not TOO difficult to handle) and BioPerl does all that for you already - why re-invent the wheel? I'm afraid there is no "subroutine that will help with this" other than the methods of Bio::SearchIO, which do all you need for you and are quite straight-foraward to use. If you are having trouble using those then please post your code and I'm sure we can help you gettng started.
Re: Subroutine to parse BLAST
by erix (Prior) on Dec 06, 2010 at 18:04 UTC

    There are several versions and implementations of BLAST. Can we assume you use NCBI's blast+ : ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ ?

    There is a user_manual.pdf there, which specifies the possible output formats.

    What output do you have to parse? Did you notice that BLAST can output in a table format (option -outfmt)? With that you can pipe the blast-produced alignment data straight into a database table, or, of course, into a perl program. BLAST+'s table-format makes parsing rather trivial.

    Can you let us know which implementation and version of BLAST you use? Show the actual output? Nobody can guess what your output looks like... Maybe also tell us whether it's homework or not (because of the no-bioperl condition)?

Re: Subroutine to parse BLAST
by umasuresh (Hermit) on Dec 06, 2010 at 16:55 UTC
    I recommend reading Beginning Perl for Bioinformatics. Along with the book the author provides a module  BeginPerlBioinfo.pm which has subroutine  extract_HSP_information showing how to parse BLAST result. This is a good starting point for your task.
      Yes, that is THE book for learning Perl for bioinformatics - well written and definitely a "must have" for the beginner. But I would add that it's only a good starting point if you want to a) follow the lesson for the purpose of learning Perl or b) really, really can not possibly use BioPerl in your environment. In all other cases, BioPerl is the way to go.
Re: Subroutine to parse BLAST
by Anonymous Monk on Dec 06, 2010 at 16:38 UTC

    To pull a common thread from the above posts:

    I need to do the question without BioPerl.
    95% of the time such statements are completely wrong. The rest of the time it is somebody trying to hide the fact that it is a homework problem.

    Tell us why you think you can't use it, and we'll tell you how to make it work despite the limitations imposed on you.

A reply falls below the community's threshold of quality. You may see it by logging in.