Re: File Parsing issue

Hi ig, I wrote the code in this manner... What is the pros / cons of using your code over this? Or do you think it is just the same?


#!/usr/bin/perl

use strict;

while( <DATA> ) {
    if ( /\s+\w+\s+\d+\.\.\d+/ ... /\"GeneID:\d+\"/ ) {
        print "$1 $2 $3" if ( /\s+(\w+)\s+(\d+)\.\.(\d+)/ );
        print " $1" if (  /\/(gene=.*)/ );
        print " $1" if ( /\/(transcript_id=.*)/ );
        print "\n" if ( /\"GeneID:\d+\"/ );
    }
}
[download]

Comment on Re: File Parsing issue Download Code

Replies are listed 'Best First'.
Re^2: File Parsing issue by ig (Vicar) on Mar 19, 2009 at 19:51 UTC
I made fairly minimal change to the original code, to help the OP see how close they were to a solution. My idea of optimum would almost certainly be different and others would have other very good ideas - as you have. Details of the results aside, what you have written is quite appealing: I find it easier to read and understand than the original. The code you wrote is not quite the same: for example you don't display the information following LOCUS, of which there is no example in the sample data provided, and you don't remove the quotes from the gene and transcript_id values. The input data looks like an excerpt from a GeneBank flat file record. These files are quite complex and rather than writing my own software I would probably turn to BioPerl for modules to read these files.	[reply]

Replies are listed 'Best First'.

Re^2: File Parsing issue
by ig (Vicar) on Mar 19, 2009 at 19:51 UTC

I made fairly minimal change to the original code, to help the OP see how close they were to a solution. My idea of optimum would almost certainly be different and others would have other very good ideas - as you have. Details of the results aside, what you have written is quite appealing: I find it easier to read and understand than the original.

The code you wrote is not quite the same: for example you don't display the information following LOCUS, of which there is no example in the sample data provided, and you don't remove the quotes from the gene and transcript_id values.

The input data looks like an excerpt from a GeneBank flat file record. These files are quite complex and rather than writing my own software I would probably turn to BioPerl for modules to read these files.

[reply]