in reply to Re^16: list of unique strings, also eliminating matching substrings
in thread list of unique strings, also eliminating matching substrings

I don't get the point why continuing by comparing the last byte is better than comparing the second byte.

Because it detects when you can skip ahead. Ie. It exactly achieves, very efficiently, only checking 100 starts when comparing a 300 byte string against a 400 byte string. No other datastructure is required.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^17: list of unique strings, also eliminating matching substrings

Replies are listed 'Best First'.
Re^18: list of unique strings, also eliminating matching substrings
by LanX (Saint) on Jun 03, 2011 at 17:03 UTC
    Argh ... I had already shut down my laptop when I understood what you mean ... well too late to update! :)

    Anyway when using C, my approach to start with the longest is for sure better and one can avoid allocation problems.

    But I'm sure there are already much more efficient algorithms out there (something like presorting the input in a dynamic tree) and there is no need to reinvent the wheel.

    And analyzing sequences is such a common task in bioinformatics, that I can hardly imagine that none of the BioPerl modules already include necessary routines. (just 50% inclusion redundancy is IMHO a very low estimation)

    Thats very likely an xy question, because normally the next step is to find overlapping endings to reconstruct the original DNA.

    Cheers Rolf

      But I'm sure there are already much more efficient algorithms out there

      No, there aren't.

      I can hardly imagine that none of the BioPerl modules already include necessary routines

      You've obviously never tried using BioPerl modules. Every one I've tried has been over-engineered, clumsy and dog slow. And those are the good ones, and many are not.

      And I'll make you a bet you won't take up.

      I bet that you cannot post a BioPerl solution to the OPs problem exactly as stated.

      Much less one that will perform better than my solution.

      I'm sure ... no need to reinvent the wheel. I can hardly imagine ... IMHO ... very likely

      Based upon your extensive experience of all things BioInformatic, and your uncanny ability to second guess the true nature of the OPs task I take it.

      IMO, you are wrong on all counts. And until the OP comes and sets the record straight, that is what I will choose to believe.

      Just as I will continue to enjoy solving the problems OPs post. Because I enjoy the challenge.

      For the record: I find your--if, but, maybe Xy, someone else has probably already solved it if only you changed the problem to suit their solution--attitude, arrogant, patronising and completely unhelpful. Oh And decidedly off-topic.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      A reply falls below the community's threshold of quality. You may see it by logging in.