Unfortunately I don't have a refernece to our program - it was an expression analysis software called Xpression by a company called InforMax - now a part of Invitrogen Corp. We built a few software systems to handle biological data for arrays, sequence analysis and pathway analysis. If you're a student or in an academic or govenment lab you can get Vector NTI for free and it has an API that you might find interesting to play with. (Shameless product placement :-). We no longer distribute Xpression but we do use it for internal analysis and development.

So the usual trend in expression analysis softwares is to normalize the data, filter and sort the data based on various criteria and finally analyse these by a variety of techniques, including clustering algorithms, neural networks, population based statistical approaches and so on. What we were doing was very similar to Rosetta or any of the other commercial softwares that still exist. Check them out. You should also have a gander at BioConductor - its a site of alorithms for analyzing different types of biological data and they are all written in R. You might find it amusing to learn that and then port the algorithm to Perl. Or simply learn to send data to a BioConductor app and get it back.

If you're interested in the references for expression analysis, try out Microarray Bioinformatics by Dov Stekel and follow the references in there. Or for a more mathematical approach, try Giovanni Parmigiani et al in The analysis of gene expression data: methods and software by Springer. I would also have a look through the Quantiative Applications in the Social Sciences series by Sage Publications. They have a nice way of taking a mathematical or statistical approach and framing it simply for us biology types and they have a nice little pamphlet on clustering.

For scientific applications in Perl, I would try out Mastering Algorithms with Perl by Orwant et al from O'Reilly or Advanced Perl Programming (first edition not second) by Srinivasan also in O'Reilly. Both those will teah you good programming practices - for instance you could gain efficiencies in your code by passing references to your arrays and dereferencing them elsewhere, rather than swapping your arrays back and forth. Something to think about in future.

Its also good to look at Pavel Pevzner's books - he's a very well respected bioinformaticist and his books explain the process of algorithm development at a very accessible level. I usually buy one of his books for each of my employees at Christmas when he puts out a new one.

MadraghRua
yet another biologist hacking perl....


In reply to Re^3: RFC: Fuzzy Clustering with Perl by MadraghRua
in thread RFC: Fuzzy Clustering with Perl by lin0

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.