Re: Is it possible to make reference to substrings and sort them?

problematic when the text is large.

How big is your alphabet? How big is "large text"?

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this

In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Comment on Re: Is it possible to make reference to substrings and sort them?

Replies are listed 'Best First'.
Re^2: Is it possible to make reference to substrings and sort them? by spooknick (Novice) on Mar 22, 2015 at 12:10 UTC
It's for bioinformatics, so the alphabet is 4 letters, and the text can be up to 5M characters in my case.	[reply]
Re^3: Is it possible to make reference to substrings and sort them? by BrowserUk (Patriarch) on Mar 22, 2015 at 13:12 UTC
the text can be up to 5M characters No matter how you generate them, just storing the 5 million rotations of 5 million characters (>23 Terabytes of data) is going to be a problem, long before you get to the point of trying to sort. If the purpose of your BWT generation is to aid compression, then there are much more effective and efficient methods of compressing genomic data. If the purpose is Short Read Alignment, you almost certainly need to look at the suffix trie methods described there. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^4: Is it possible to make reference to substrings and sort them? by spooknick (Novice) on Mar 22, 2015 at 17:53 UTC
Yes, this is why I wouldn't want to generate all the rotations. Short read alignments is what I need to do. I have a few thousand short sequences, length of ~20nt. I want to find those that are present in my text, as well as their position. BWT is the way to go. Suffix trie is not efficient in terms of memory, as the trie needs to store all suffixes from 1 to length(text). Thank you for the link to the paper.	[reply]
Re^5: Is it possible to make reference to substrings and sort them? by BrowserUk (Patriarch) on Mar 23, 2015 at 05:40 UTC
Re^6: Is it possible to make reference to substrings and sort them? by spooknick (Novice) on Mar 23, 2015 at 08:21 UTC
Some notes below your chosen depth have not been shown here