But I also think that the solution is not to turn your back against it on the basis of poor performance or interface clunkiness alone.

The problems with BioPerl go so much deeper than just "poor performance" or "interface clunkiness".

And these are just a few of the problems with one tiny part of this behemoth. And they are endemic. O'Woe engineering built atop performance-doesn't-matter architecture.

There is simply no logic to wrapping 7 layers of OO over Perl's powerful built-ins in order to read and write simple file formats. But that horse long since flew the coupé :)

We are, however, open to anyone (biologists or not) willing to make contributions and improvements to the code; much of the tasks in this area don't require any bio-knowledge.

As Pat said, when asked for directions: "Ah now! If I were you trying to get to there, I wouldn't be starting from here."

The problem is that the problems run so deep, that you cannot patch-fix the implementation whilst leaving the architecture and interfaces intact. And any attempt by a non-biologist outsider to suggest changing the architecture, interfaces and implementation; would be like an Englishman suggesting the US change it's gun laws. It just ain't gonna happen. All the considerations of backward compatibility and installed base, compounded by vested interests, long standing contributions and NIH.

About the best thing that could be done is to go for a Bio::Lite. A few small modules with minimal interfaces optimised to work with rather than against Perl's native abilities.

  1. Half a dozen PerlIO layers for reading and writing the basic file formats.
  2. A few Genome-tailored regex generators to simplify searching and fuzzy-matching the basic ACGT & extended ACGTNXacgt sequence formats.
  3. A couple of wrap-overs of one of the Math* modules to provide the more commonly used statistical tools.
  4. And most importantly, some tailored, worked examples of using some of Perl's more esoteric built-in facilities--like pack/unpack and bit-wise string operations to perform the more common manipulations.

If anyone was to start that project, you could count me in, as I think that Genome research is one of the most worthwhile areas of open source development around. But it would require someone with a decent understanding of the field to head-up such a project, otherwise you'd just end up with another programmers view development, instead of a User's Needs driven one. And that would benefit no one.

In theory, Perl 6/Parrot would make a good basis for a new Bio-project, bringing the best of OO, functional, and perhaps even parallelism to the table to provide for a clean and efficient solution. But even then, it would be a surprise to me if anything more was done than just to re-implement the existing Bio-Perl interfaces as quickly and directly as possible, without spending any time exploring how the new language and facilities it provides could be best utilised.

It will probably take until BioPerl6-II, for people to have become sufficiently familiar with Perl 6 to begin to see the possibilities--and by then, 2 or 3 Christmases after the Christmas delivery of Perl 6--the existing interfaces will be too ingrained, and installed base will be too large, to consider radical changes. And it would take radical changes to address the problems.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^4: Lower-casing Substrings and Iterating Two Files together by BrowserUk
in thread Lower-casing Substrings and Iterating Two Files together by neversaint

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.