comment on

Dear monks,

I need to parse text files of several formats. Based on the contents of the text files, one or more trees, matrices, and/or taxon objects are created. I wonder if you can advice me, from a user perspective, what would be a convenient interface. Right now, what you'd do is:

use Bio::Phylo::Parsers;
my $parser = new Bio::Phylo::Parsers;

# the newick format contains one or more trees.
my $trees = $parser->parse( -format => 'newick', -file => $newickfile 
+);

# the 'taxlist' format is simply a list of names, from
# which a taxa object is created
my $taxa = $parser->parse( -format => 'taxlist', -file => $taxonfile )
+;

# the nexus format is a mixed format, that can contain
# trees, taxa, matrices, etc.
my $arrayref = $parser->parse( -format => 'nexus', -file => $nexusfile
+ );
[download]

The Bio::Phylo::Parsers package functions as a facade, that require's the appropriate parser submodule based on the format switch.

The problem lies primarily in the nexus format, which can contain a bunch of different things. Right now, without recourse to the nexus text file, it is impossible to say what is returned by the parser (it simply returns an array ref with all objects it parsed from the file). This seems too ad hoc. All other parsers return Bio::Phylo::* objects.

An option I've been considering is to make it such that not all text in the file is parsed, but only those things (if present) that the user wants:

my $trees = $parser->parse(-format => 'nexus', -I_want => 'trees', -fi
+le => $nexusfile);
[download]

So, based on the "-I_want => " switch, the parser only ever returns what the user wants (if present).

Would that be a convenient way to go about things? What would you do?

Thanks!

In reply to Parser object to return different objects by rvosa

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.