daverave has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a GFF3 file where the last column sometime contains Note=...;locus_tag=...;...

I would like to load this GFF3 file, pull out all the features of a certain type (i.e. CDS) that also have locus_tag=X, and get for each one it's starting position and length.

Thank you.

  • Comment on bioperl newbie's question: simple GFF3 peocessing

Replies are listed 'Best First'.
Re: bioperl newbie's question: simple GFF3 peocessing
by BioLion (Curate) on Jul 31, 2010 at 09:23 UTC

    Hi daverave, you can do all you want and more using BioDB::GFF and the related Bio::DB::GFF::Feature. These will let you load a GFF file into a database (which can be in memory if you want) and extract features of whatever type you like and also sub-group them by attributes.

    There is example code for doing much of this on their respective CPAN pages, so have a go, see how far you get and come back to us if you are still struggling - remember to include code/input/error messages/output that you write/use/get/want.

    Good luck - hope this helps.

    Just a something something...
      Thank you very much BioLion.
Re: bioperl newbie's question: simple GFF3 peocessing
by BrowserUk (Patriarch) on Jul 31, 2010 at 09:06 UTC

    Is there a question in there some where?

      How do you filter for such features?

        Well first your read a line of the file; then parse it into the nine fields; and the test the 9th column for a 'locus_tag=', using index or a regex. If it contains one, store the record in an array or to another file.