Re^2: Search a hash for closest match

Replies are listed 'Best First'.
Re^3: Search a hash for closest match by kennethk (Abbot) on Nov 01, 2011 at 18:00 UTC
Profile - like the great Knuth said, "Premature optimization is the root of all evil." Based upon the specs you give in Re^2: Search a hash for closest match, I would expect that linear search is not even remotely your optimal solution, and that using Perl's sort (Merge sort) followed by binary search is probably pretty derned good. However, better than fast code is functional code. I would highly recommend developing your script with a very simple bit of logic like this and small data sets, and improving the search algorithm later in the development cycle once it becomes necessary. YMMV.	[reply]
Re^4: Search a hash for closest match by aquinom (Sexton) on Nov 01, 2011 at 18:06 UTC
I think my response may have been a bit unclear/ambiguous: there are only 928K positions in total, so we can roughly estimate 40K elements per array (1M sites/25 chromosomes) and only 1 array will have to be searched based on the chromosome given.	[reply]
Re^5: Search a hash for closest match by kennethk (Abbot) on Nov 01, 2011 at 18:47 UTC
I followed what you said. Allow me to illustrate my point with an exchange I had with a colleague yesterday. He came into my office, asking me how to optimize sorting on a large set of objects. He thought perl was running pretty slowly, and this seemed to him to be the issue. We had a lovely discussion where I suggested he use the Orcish Maneuver as a solution for dropping the expense of his sort. I also suggested strongly that he should think about running his code through Devel::NYTProf to find out if the bottleneck was where he thought. And of course it wasn't, and his sort speed was completely irrelevant. Your proposed sorting algorithm is probably a good optimization. But make your script work first, then make it work quickly. The more complex the search algorithm, the easier it is to make a stupid mistake. Linear search may be slow, but it's damn hard to introduce an inobvious bug to it.	[reply]