in reply to Re^2: bioperl newbie's question: simple GFF3 peocessing
in thread bioperl newbie's question: simple GFF3 peocessing

Well first your read a line of the file; then parse it into the nine fields; and the test the 9th column for a 'locus_tag=', using index or a regex. If it contains one, store the record in an array or to another file.

  • Comment on Re^3: bioperl newbie's question: simple GFF3 peocessing

Replies are listed 'Best First'.
Re^4: bioperl newbie's question: simple GFF3 peocessing
by daverave (Scribe) on Jul 31, 2010 at 09:22 UTC
    thank you, but I think it might be wasteful given that this is a standard file format supported by bioperl. I got some help from this: http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/SeqFeatureI.html, I hope I will get it right.
      thank you, but I think it might be wasteful given that this is a standard file format supported by bioperl.

      I'll bet you a pound to a penny that it will take you longer to write; be far harder to maintain; and run more slowly; than this:

      c:\test>perl -F"\t" -ane"$F[8]=~/Gap=/ and print" chromosome_1.gff DDB0232428 . EST_match 15809 17104 . + + . ID=DDB0014588;Target=DDB0014588 1 561;Gap=M405 I786 M104 DDB0232428 . EST_match 66374 66803 . + + . ID=DDB0014789;Target=DDB0014789 1 339;Gap=M96 I111 M222 DDB0232428 . EST_match 117098 117584 . - + . ID=DDB0017340;Target=DDB0017340 1 492;Gap=M486 DDB0232428 . EST_match 122479 123082 . + + . ID=DDB0041612;Target=DDB0041612 1 619;Gap=M603 DDB0232428 . EST_match 162661 163197 . - + . ID=DDB0017341;Target=DDB0017341 1 558;Gap=M536 DDB0232428 . EST_match 162661 162971 . - + . ID=DDB0161652;Target=DDB0161652 1 319;Gap=M310 DDB0232428 . EST_match 162670 163422 . + + . ID=DDB0127927;Target=DDB0127927 1 752;Gap=M752 DDB0232428 . EST_match 162670 163375 . + + . ID=DDB0112861;Target=DDB0112861 1 705;Gap=M705 DDB0232428 . EST_match 162670 163335 . + + . ID=DDB0031935;Target=DDB0031935 1 652;Gap=M26 I18 M621 DDB0232428 . EST_match 162670 163285 . + + . ID=DDB0061852;Target=DDB0061852 1 615;Gap=M615 DDB0232428 . EST_match 162670 163398 . + + . ID=DDB0117238;Target=DDB0117238 1 729;Gap=M728 DDB0232428 . EST_match 162670 163308 . + + . ID=DDB0061789;Target=DDB0061789 1 639;Gap=M638 DDB0232428 . EST_match 162670 163378 . + + . ID=DDB0067313;Target=DDB0067313 1 707;Gap=M708 DDB0232428 . EST_match 162670 163402 . + + . ID=DDB0064238;Target=DDB0064238 1 732;Gap=M732 DDB0232428 . EST_match 162670 163430 . + + . ID=DDB0063928;Target=DDB0063928 1 760;Gap=M760 DDB0232428 . EST_match 162671 163372 . + + . ID=DDB0126764;Target=DDB0126764 1 700;Gap=M701 DDB0232428 . EST_match 162675 163332 . + + . ID=DDB0028393;Target=DDB0028393 1 663;Gap=M657 DDB0232428 . EST_match 162687 163332 . + + . ID=DDB0065179;Target=DDB0065179 1 661;Gap=M645 DDB0232428 . EST_match 162699 163215 . - + . ID=DDB0018629;Target=DDB0018629 1 524;Gap=M516

      That's a real GGF file downloaded from the web. It didn't have "locus_tag" tags, so I used "Gap", but it took longer to find the file than parse it.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        You're probably right, but this will once again make me skip learning a bit if bioperl.

        I think I already mentioned this in the past, but learning new stuff is something I enjoy, although in the short run it might sometimes take longer than using techniques I already know.

        Thanks for the help though, I do appreciate it.