in reply to Algorithms

and also read Mastering Algorithms with Perl

Whilst it is a fine book, I personally think it is misnamed. Introduction to Algorithms in Perl would be better. The coverage is so light that I think the term "Mastering" is misused.

What type of good algorithm you have used for your programs

Patricia tries. Threaded trees. 2-3 Trees. Binary range search. Evey kind of traversal.

and how do you come up with good algorithms?

Id like to take issue with the term "good" here. Good doesnt have any place in a technical discussion. MJD has a wonderful monologue on how emotional terms like "intuitive" and "counterintuitive" (and IMO "good") are poor choices in a discusion of the merits of a technical idea. Basically what is intuitive or good to one person/circumstance is not good in another sense.

This point leads me to the answer (for me) to your question: By understanding the problem so well that it is possible to select a undeniably superior approach given the constraints and objectives at hand. For instance in many respects program design these days comes down to "get input data, stick data into DB, query the DB in various ways, present results from DB queries in a useful fashion". Its almost reflex to just stuff data into a DB and then expect the DB author to have optimized things so that the complexity involved in whatever we wish to do is handled by the DB. And as a general purpose strategy it isn't that bad. You turn over optimisation and advanced programming issues to the experts (the DB gurus) and in effect become the assembler of parts. The problem is that the solution probably wont be optimal for your target domain (unless it is inherently relational DB related). The DB designers need to design solutions that are generic. For instance the indexing used by a DB probably wouldnt be as optimal (for the particular definition of optimal that suits your circumstance) as a custom solution. Of course it will take a lot more resources (time, money etc) to get the latter right, wheras the former leaves the tricky stuff to other people.

A few examples of this might be in order. Consider that routines like mergesort, or quicksort are the usual highspeed generic sorting algorithms. Trouble is that if you are sorting fixed length strings (especially with a limited alphabet like telephone numbers or postal codes) these algorithms are not optimal. A properly written radix sort should outperform them in these constrained cases.

Another example is looking data up in a table. We might use any number of indexing strategies, like B+Tree's or binary trees, or even binary searching a sorted list. OTOH, if we know a lot about our usage profile we may know that most times we do a look up we end up retrieving the same value as last time. In this case an algorithm that involves caching the last looked up value will outperform one that doesnt.

Or we might take your example of sorting. Perls sort is generic, its intended to be fast for the "average" data sets and cases. It turns out that the version used in some versions of perl has some _really_ bad cases however, and so it has been changed (isn't even quicksort anymore iirc). Anyway, the point is that in Perl its easy to just use sort to sort things. But in some situations it may not be advisable to do so. For instance it wouldnt make sense IMO to use sort if you were only adding one value and putting it in order. I might use binary search with splice, or even a single pass of a bubble sort if the list was small.

I think the thing with algorithms is that its not just knowing how given algorithms work, but rather the various trade offs presented by one or the other. The art of programming is selecting the right tradeoffs and the algorithms that follow from them. Hashes are a good example. The desire was to have associative arrays. The overriding concern was to make them fast for store and fetch (and create?). Hence the use of hashing and the name. VB for reasons that aren't clear to me chose some other representation (doubly linked list?), suffice it to say that lookup speed wasnt a priority. (On reflection maybe VB's collections are faster to iterate over. If so it would suggest linked lists. Hashes dont seem like they would be the fastest way to iterate IMO.)


---
demerphq

<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

Replies are listed 'Best First'.
Re: Algorithms
by Abigail-II (Bishop) on Jun 25, 2003 at 00:30 UTC
    Or we might take your example of sorting. Perls sort is generic, its intended to be fast for the "average" data sets and cases. It turns out that the version used in some versions of perl has some _really_ bad cases however, and so it has been changed (isn't even quicksort anymore iirc).

    Well, in 5.8.0, the default is mergesort; however, quicksort is still available on demand. But if you select quicksort, a trick will be performed to make it unlikely to hit one of the cases where quicksort behaves quadratic: large arrays will first be shuffled.

    Anyway, the point is that in Perl its easy to just use sort to sort things. But in some situations it may not be advisable to do so. For instance it wouldnt make sense IMO to use sort if you were only adding one value and putting it in order. I might use binary search with splice, or even a single pass of a bubble sort if the list was small.

    Actually one of the decisions for using the current implementation of mergesort was to be able to deal well with nearly sorted lists. The mergesort in Perl is faster if there are longer runs of sorted, or reversely sorted, elements. So, in 5.8.0, it does make sense to push an element on a sorted array, and then sort that list, instead of using a binary search and splice. Both operations will be linear (!), but the former will do almost everything in C.

    Abigail

      Why doesn't perl use heapsort?
        Bluepixel wrote:
        Why doesn't perl use heapsort?
        Because there are no real advantages to heap sort for general sorting, except for the ability of doing in-situ sorting, but that's not relevant with Perl's sort routine. The particular implementation of mergesort Perl is using has a few advantages: it's stable and it performs very well with sorted, or nearly sorted data. It also has a tight upperbound, it's worst-case behaviour is O (N log N). And while the quicksort implementation isn't stable (it could be made stable at some costs) and runs in quadratic time with some inputs, it runs fast on random data - for two reasons: the algorithm used, and it's very memory cache friendly (most computers nowadays have a memory cache). Neither mergesort and heapsort are cache friendly - they go all over the place.

        Abigail