This is a review of the BioPerl
used to not be available through CPAN,
so they must
be obtained from the BioPerl website.
, so you used to have to
get them from the BioPerl website. That is no longer true,
you can now use the standard CPAN shell to install BioPerl. This is a large set
of modules covering several bioinformatics tasks. This will
be a fairly high level review, as there are 174 modules that
make up this set (the full install is 5.4 M). The most
recent release as of this writing is 0.7.1, and there is
a developers release (that I have not looked at) 0.9.
The prerequisites are nothing out of the ordinary:
LWP, IO::String, XML::Node,
and XML::Writer, though BioPerl does provide interfaces
for several programs and databases, so to work with those,
you will obviously need to have them too. Bundle::BioPerl
will install all of the prerequisites for you, though I installed
doing the make tango and the installation
was flawless; a few tests failed out of over 1000,
but that wasn't a big deal.
There are several module groups:
- Bio::AlignIO::*: wrappers for several alignment programs
like clustalw and pfam.
- Bio::Annotation::*: Objects for holding annotations (simple
comments, links to other databases, or literature references).
- Bio::DB::*: Interfaces to several databases, including GenBank,
GenPept, SwissProt and several others.
- Bio::Factory::*: This is a set of objects for instanciating
Bio::SeqAnalysisParserI, which is a generic interface for
sequence analsys parsers. The idea is to give a generic
interface for parse so that annotation pipelines can be
built, and when a new parser or program comes along, a complete
rewrite is not necessary.
- Bio::Index::*: Methods for indexing several types of
- Bio::LiveSeq::*: This is a very feature rich DNA sequence
object. Several types of annotations can be added here. It
seems that there is a fair bit of overlap between these modules
and those in Bio::SeqFeature; it is not clear to me when, if
ever, you would want to use one over the other. It may just
be a matter of preference.
- Bio::Location::*: Contains methods for handling location
coordinants on sequences. As the documentation says, this may
seem easy, but it deals with fuzzy or compound (split) locations,
as well as handling rules for locations such as like 'always widest range'
or 'always smallest range'.
- Bio::Root::*: Several utility modules that are inherited from
in other modules.
- Bio::Seq::*: Contains extensions for the main object for sequences,
Bio::Seq, including LongSeq for long (genomic) sequences and RichSeq
for annotated sequences. Bio::Seq is the workhorse object,
which holds nucleotide or proteins sequences, as well as
annotations. It provides several handy
sequence manipulation methods such as revcom (reverse complement) and
- Bio::SeqFeature::*: Objects containing feature annotations
of sequences; allows fairly complex relationships to be expressed
between related sequences, as well as detail about individual sequences,
like the locations of exons and transcripts. The list of
possible options is somewhat limited, so more specific features
should probably be created by subclassing the generic class.
- Bio::SeqIO::*: Handles I/O streams for several sequence
database types (like GenBank annotations/features, GCG and SwissProt).
- Bio::Tools::*: Several items here, including result holders
and parsers for several programs. The BLAST parser is worth
its weight in gold.
- Bio::Variation::*: These appear to be modules for working with
SNPs and other mutations.
In all honesty, I have used only a few of these modules. The majority
of them are very specialized, so a "general practitioner" like
me is unlikely to need them often. There are so many modules
here that it is difficult to know if a problem you have might
be addressed by BioPerl, which is why I undertook writing this
review. I hope it has been helpful to you, and if you have
any experience with BioPerl, please add your comments.
Special thanks to the other members of my group, and
especially Ben Faga (not a monk, but still a good Perl
programmer), for their input and insight while writing this
review, as well as Arguile for pointing out that BioPerl is
now available at CPAN.
New note 2002-05-20: I plan on bringing this up to date
for BioPerl v1.0 as soon as possible.
In reply to BioPerl
Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
Want more info? How to link or
or How to display code and escape characters
are good places to start.