By /msg you said:
Re Re^4: genetic algorithm for motif finding The problems start with the simple ones, you have to solve them to get further. I found for example EDTA or RNAS to be a bit more complicated.
I have no idea if you have anything to do with the Rosalind site; but if you do, please do not be offended by this. It's just my opinion :) Your attempt to help me is much appreciated.
The problem I see with the Rosalind site is this: The tutorials and challenges are all geared to leading the programmer to solve the problems in one particular way. In the case of the two examples you cite are Edit Distance & graphs respectively. Both of which are a problem.
There are many different algorithms for this Levestien; Wagner-Fischer; etc. and they are all horribly inefficient O(mn) compared to simply xoring teh strings and counting the nulls:
$n = ($a ^ $b ) =~tr[\0][];
Which is O(2N). And as both N are implemented in C (opcodes); they results can be orders of magnitude faster for DNA length strings.
Unless a new graphs module has appeared on the scene (cpan) recently; implementing graphing algorithms in Perl is horribly slow and hugely memory hungry.
Graphs lend themselves to being implemented using OO-style with a Nodes/Edges/AdjacencyMatrix/Attributes/Weights/etc classes all based around blessed Hashes (or worse!) and rendering the simplest of graphs constructed with them huge, cumbersome and slow.
The idea of solving genomic problems by creating graphs of entire genes with every base a node and edges linking the pairs is just a non-starter in Perl using native Perl graphing libraries. They just require too much memory and processor.
That is probably why so much of genomic work is parceled up and sent of to mainframes or clusters (BLAST servers and the like) with gobs of memory and huge processing power, to do the donkey work. But that in turn creates its own set of knock on problems:
The silly thing is, that X86-64 class machines are actually very good at string processing; and many of the tasks can be tackled very efficiently using them; once you stop viewing the problems in terms of graph theory and look for string manipulation solutions.
That's why when these DNA/RNA problems come up; I ask for and try to extract very simple descriptions of the problems that aren't couched in either biogenetic terminology nor the mathematical symbolism applicable to just one approach to solving the problem. Unfortunately, these requests usually fall on deaf ears.
In reply to Re^4: genetic algorithm for motif finding
by BrowserUk
in thread genetic algorithm for motif finding
by shakehands
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |