comment on

By /msg you said:

Re Re^4: genetic algorithm for motif finding The problems start with the simple ones, you have to solve them to get further. I found for example EDTA or RNAS to be a bit more complicated.

I have no idea if you have anything to do with the Rosalind site; but if you do, please do not be offended by this. It's just my opinion :) Your attempt to help me is much appreciated.

The problem I see with the Rosalind site is this: The tutorials and challenges are all geared to leading the programmer to solve the problems in one particular way. In the case of the two examples you cite are Edit Distance & graphs respectively. Both of which are a problem.

Edit distance.
There are many different algorithms for this Levestien; Wagner-Fischer; etc. and they are all horribly inefficient O(mn) compared to simply xoring teh strings and counting the nulls:
```
$n = ($a ^ $b ) =~tr[\0][];
[download]
```
Which is O(2N). And as both N are implemented in C (opcodes); they results can be orders of magnitude faster for DNA length strings.
Graphs.
Unless a new graphs module has appeared on the scene (cpan) recently; implementing graphing algorithms in Perl is horribly slow and hugely memory hungry.
Graphs lend themselves to being implemented using OO-style with a Nodes/Edges/AdjacencyMatrix/Attributes/Weights/etc classes all based around blessed Hashes (or worse!) and rendering the simplest of graphs constructed with them huge, cumbersome and slow.
The idea of solving genomic problems by creating graphs of entire genes with every base a node and edges linking the pairs is just a non-starter in Perl using native Perl graphing libraries. They just require too much memory and processor.
That is probably why so much of genomic work is parceled up and sent of to mainframes or clusters (BLAST servers and the like) with gobs of memory and huge processing power, to do the donkey work. But that in turn creates its own set of knock on problems:
1. These batch processors tend to produce far more information than most querants require. (say) Producing edit distances and other quality statistics; when often only a boolean yes or no is required.
2. Their output is often-as-not in a form -- multi-line 'carded' records and the like -- which make extracting the required results almost as complex as solving the original problem.

The silly thing is, that X86-64 class machines are actually very good at string processing; and many of the tasks can be tackled very efficiently using them; once you stop viewing the problems in terms of graph theory and look for string manipulation solutions.

That's why when these DNA/RNA problems come up; I ask for and try to extract very simple descriptions of the problems that aren't couched in either biogenetic terminology nor the mathematical symbolism applicable to just one approach to solving the problem. Unfortunately, these requests usually fall on deaf ears.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^4: genetic algorithm for motif finding by BrowserUk
in thread genetic algorithm for motif finding by shakehands

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.