comment on

Oh my, oh my, oh my,

is this possible!!!!
First of all, as grandfather pointed out earlier there is lot of info missing, like:

1. Is this suppose to be a fasta formated entry or not ???.
2. Do all duplicated sequences have the same ID (I guess not, but you didn't specify...)
3. How would you choose which sequence(seq header) you wanna keap and which you wanna leave out ??

Now if I'm correct and this is a fasta entry and seq headers are not the same then, what I would do is load the sequences a hash (bio-perl can help you with fasta entries) such that the seq body is the key and header the value. As a result you will and you will automatically get a unique set of sequences...

NOW why "Oh my, oh my, oh my"....

well for large set of strings the above method is not something what I would recommend, but rather I would try to enforce a different strategy. So when you posted the question first thing that fell to my mind was, a Trie(keyword tree), so I started to search the CPAN db for a module but I couldn't find it!!!!

So my question to other monks is ; Is the parser for the trie data structure stored under some strange name or there really is no module for it? Furthermore, is somebody working on it already or should I do it, since lately I do a lot of programming involving suffix trees, tries, suffix arrays and so on ....??

Cheers

baxy

In reply to Re^3: sort sequences and keep ID of them "Oh my, oh my, oh my!!!" by baxy77bax
in thread sort sequences and keep ID of them by Diane4Luo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.