Re: Search for identical substrings

... the length of my strings (3k characters), and the number of elements (300) leads to prohibitive times for my search. It took a week just to check one element of the array against every other element.

Can you confirm this please. Your current method took 1 week to do 299 LCSs. Which as you have (300 * 299)/2 = 44,850 to do, this would take 150 weeks to perform the processing?

If so, I think I can help you. I believe I can get that down to 67 hours. But, as this is so much quicker (than both your current method and a couple of others I have tried), I would very much like to verify my program against some known data.

So, if you could let us/me have say 5 of your 3k strings, and the LCS that your current method finds + the time taken, I could check what I have against your findings before exposing any stupidities to the world.

TIA.

Alternatively, I could provide my test data for you to try.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Comment on Re: Search for identical substrings

Replies are listed 'Best First'.
Re^2: Search for identical substrings by bioMan (Beadle) on Aug 18, 2005 at 16:19 UTC
Your time estimates agree with mine. I calculated a time of completion of 3 years. Thank you for your offer. I would like to look at all my options first, including, abandoning the project, optimizing the data be removing redundant sequences (no easy task given the lack of documentation for some of my data), or subclassing the data into smaller sets of sequences. I would also like to look at the other responses I've received, but I will not forget your offer.	[reply]
Re^3: Search for identical substrings by GrandFather (Saint) on Aug 19, 2005 at 00:18 UTC
Can you generate a data set that is representative of the problem and put it in your scratchpad? Perl is Huffman encoded by design.	[reply]
Re^4: Search for identical substrings by bioMan (Beadle) on Aug 19, 2005 at 17:18 UTC
I have placed six actual strings from my database into my public scratchpad. Each string is formated as follows: `>string 1 ATGCTGTAGCATGCATG...CGATCATGTGACTACGT >string 2 . . .` [download] The first line starts with ">" followed by a string ID. The second line is the actual data string.	[reply] [d/l]
Re^5: Search for identical substrings by BrowserUk (Patriarch) on Aug 20, 2005 at 00:49 UTC
Re^6: Search for identical substrings by bioMan (Beadle) on Aug 22, 2005 at 16:54 UTC
Some notes below your chosen depth have not been shown here


Don't ask to ask, just ask
	PerlMonks