llancet has asked for the wisdom of the Perl Monks concerning the following question:

Does Bioperl has mechanisms to parse and get CONTIG information from Genbank formatted file? For example:
LOCUS EM160018 1539 bp DNA linear CON 05 +-APR-2007 (lots of things here) REFERENCE 1 (bases 1 to 1539) (lots of references here) FEATURES Location/Qualifiers (lots of features here) CONTIG join(AACY021632137.1:1..873,gap(51), complement(AACY021547504.1:1..615)) //
How can I access those things under the "CONTIG" part?
Thanks a lot!!!

Replies are listed 'Best First'.
Re: Bioperl: how to parse contig info from genbank scaffold file?
by roboticus (Chancellor) on Dec 18, 2010 at 14:36 UTC
      I'm asking about functionalities in Bioperl module. If somebody understands that, the hidden large parts of the file is not relevant. If nobody understands that, I will try to write a parser.
Re: Bioperl: how to parse contig info from genbank scaffold file?
by elef (Friar) on Dec 18, 2010 at 14:50 UTC
    Depends on what you want to do and what your data looks like.
    You've told us all about the irrelevant parts of the file, but you didn't say much of anything about the part you need. It starts with CONTIG... and? Is it one line or several lines? Where does the contig section end? End of the file? At a fixed keyword? How large is the source file? How many occurrences of CONTIG are there in a single file, one or more? How many files, one or thousands?
    And what do you want to do with the data? Just print it into a new file?

    Edit: this was written before I realized that the OP was asking about the bioperl module, I thought this was about parsing a text file "manually" with a loop and regex. From the little information there is, it seems that a regex would still work, though.