Hi perl_n00b!

Good to see you taking up Perl for bioinformatics! It is really unparalleled for this kind of stuff.

Your task is a very typical bioinformatical problem and solving it is a great learning experience. I am a biologist myself and sort of a self-thought perl-buff who learned the language while writing bioinformatics tools. My comment on your post is however not going to be about the code since others have already provided great feedback, but a hint.

When it comes to parsing contents in Genbank files in particular, I can tell you from personal experience that there are all these little unpredictable variations from the "norm" (rather than "standard") popping up every here and there that are a pain to catch since they might occur in one accession out of 20.000. The rich EMBL format is usually much simpler to parse if you have a choice among these two.

Do *not* waste your time on trying to write a *comprehensive* Genbank parser from scratch unless you really have to. Instead I really really recommend looking into the Bioperl project where they have already implemented a pretty reliable Genbank parser. The main benefit of the Bioperl project, it's comprehensiveness, is however also its main drawback. It can be bewildering to try to get an overview of all the libraries and methods and you will need to be comfortable with or open to learn a little object oriented perl programming to grasp what is going on. I can tell you however that the effort you put into learning a some Bioperl will be well worth it in the end.

You can find Bioperl here:
BioPerl

and some simple examples here:
Bioperl Tutorial

In reply to Re: Spltting Genbank File by korpenkraxar
in thread Spltting Genbank File by perl_n00b

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.