First the definition:
What is an algorithm? : A step-by-step problem-solving procedure, especially an established, recursive computational procedure for solving a problem in a finite number of steps.

It's often said that you can make your program faster using better algorithm when someone talk about improving performance. I have studied lot of algorithms and also read Mastering Algorithms with Perl to know the tips and tricks. Having worked with perl's sort functionality, I have really not used other sort algorithm such as 'bubble sort', 'radix sort', or 'heap short', 'shell sort' etc.. other than for educational purpose.

Searching CPAN for algorithm yields to many evolutionary algorithms.
Some others are: MarkovChain, Huffman, Permute, Merge, Text::English , Image Processing Algorithms etc..

My Questions:
1. -- What type of good algorithm you have used for your programs and how do you come up with good algorithms?
2. -- Would you consider integrating different technology in a proper way to do the task (ex.. Apache, MySQL, Perl, HTML) as an algorithm ? What can be said about integrating different modules in a single program ( CGI, LWP, DBI) that you use over and over again.

Thanks,
artist

==================

Update 1:
Small example of algorithm: Finding GCD
Commercial big example of algorithm: Google's Page Rank
Update 2:
Definition is taken from Reference.com
Update 3:
Top 10 Algorithms from Computing in Science & Engineering, which includes FFT and Simplex algorithms.

Replies are listed 'Best First'.
Re: Algorithms
by Zaxo (Archbishop) on Jun 24, 2003 at 04:44 UTC

    The classic example of an algorithm making a big difference is the discovery of the Fast Fourier Transform by Cooley and Tukey, in 1965.

    Digital Fourier transforms decompose a regularly spaced array of data in terms of the associated frequency. They have properties which ease many kinds of data analysis problems. Before 1965, it was believed that dft was O(N**2), quadratic, in the size of the data set. That meant it was only useable for small data sets, and did not scale to larger problem well. Practical uses were few.

    With the introduction of the FFT, that all changed. The use of a divide-and-conquer scheme reduced dft from quadratic to O(N*ln N), not much worse than linear. That produced a revolution in signal processing technology. Applications that had been dismissed as unworkable became possible, almost easy.

    Besides use in software for data analysis of all kinds, hardware implementations were soon embedded in digital signal processing chips. There, the fft became the basis of digital audio and video products we all use. It is no coincidence that the CD audio sample rate of 44100 Hz is the product of small primes, 2**2 * 3**2 * 5**2 * 7**2. The FFT likes it that way.

    CPAN has a number of modules providing FFT. I particularly like the one with PDL because of the handiness of piddles for other processing.

    After Compline,
    Zaxo

      In fact, I'd say that the FFT has done more to change society then any other general-purpose computer-based algorythim. FFT made the digital transfer and storage of near-photo-quality images practical, through it's use with JPEG, thus brining porn to the internet, creating huge social issues (see, for example, today's (well, now yesterday's, technicaly) US supreme court rulling), as well, in no small part, fulling the growth of the internet. Also, mp3s, with their own social problems and growth, MPEG and other compressed video, and in general most lossy compression algorithms.

      FFT has made taking real-life media into the digital world possible, and it's real-life media that creates much of the contriversy of the digital world, and contraversy moves both worlds forward.


      Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: Algorithms
by Abigail-II (Bishop) on Jun 24, 2003 at 12:32 UTC
    First the definition:
    What is an algorithm? : A step-by-step problem-solving procedure, especially an established, recursive computational procedure for solving a problem in a finite number of steps.

    That's a very unusual definition of an algorithm. I've never heard of a definition that requires an algorithm to be recursive. And if algorithms need to be established, how do you call a new way of solving a problem?

    In his wonderful series The Art of Computer Programming (if you haven't read it yet, run to your nearest bookstore now and buy it - it will be the best $150 you've ever spend), Knuth dedicates the first paragraph (nine pages, including exercises(!)) to the meaning of the concept algorithm. Quoting him:

    The modern meaning for algorithm is quite similar to that of recipe, process, method, technique, procedure, routine, rigmarole, except that the word "algorithm" connotes something just a little different. Besides merely being a finite set of rules that gives a sequence of operations for solving a specific type of problem, an algorithm has five important features:
    1. Finiteness.
    2. Definiteness.
    3. Input.
    4. Output.
    5. Effectiveness.

    Each of the above points is further explained, but I've omitted the explaination.

    Cormen, Leiserson and Rivest start their Introduction to Algorithms (another "must have") with:

    Informally, an algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output. An algorithm is thus a sequence of computational steps that transform the input into the output.

    If you wonder why Knuth requires an algorithm to be finite (that is, the algorithm terminates eventually) and CLR don't, the latter further define correct algorithms, which must terminate.

    Abigail

      I've never heard of a definition that requires an algorithm to be recursive.

      I'd suspect that the original text has got something to do with recursive function theory. In that sense, an algorithm is recursive, since solving a problem involves:
      moving from the intial condition to an intermediate state
      moving from the intermediate state to another intermediate state (between 0 and a finite number of times
      moving to the final state

      Obviously, this is a recursive structure, which only bottoms out when you get to the atomic operations of the language.

      --
      Tommy
      Too stupid to live.
      Too stubborn to die.

        Just thought add that the mathematical concept of recursion and the computational concept of recursion are different. Recursion in a mathematical sense can be handled by both iterative solutions or recursive solutions in the computational sense.


        ---
        demerphq

        <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
      That's a very unusual definition of an algorithm. I've never heard of a definition that requires an algorithm to be recursive. And if algorithms need to be established, how do you call a new way of solving a problem?

      Reread the original post and notice the word especially.


      milkbone - perl/tk instant messaging - it's the only way to fly

      You know anyone who'll debug two million lines of code for what I get this job?
      - Dennis Nedry

•Re: Algorithms
by merlyn (Sage) on Jun 24, 2003 at 05:50 UTC
    From my understanding of the term, every program I write is an algorithm, because it's a recipe to solve a particular problem using the available subtasks in an orderly fashion.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Programs aren't algorithms. Programs are implementations of algorithms.

      Abigail

        Potatoe, Potatoe.

        Combine two algorithms, what do you get? Right, another algorithm. Repeat this several times, now you have a very large algorithm.

        The only thing that separates a program from an algorithm is data. So why does it seem that an algorithm is so much different from a whole program? It's really quite simple - we suck. Programming hasn't yet evolved to the point where our products resemble actual engineering. So what we have is a mess of unprovable code that we can't even guarantee will work as intended.

        An algorithm is commonly viewed as a tried and tested method of accomplishing a very basic goal. The next step is transcending simple sorts and being able to create real solutions with the same reliability.

Re: Algorithms
by chunlou (Curate) on Jun 24, 2003 at 06:18 UTC
    Integrating different technology is more like design of experiment to me than algorithm. Partly because you can't always do "integration" the analytic way you can with algorithm. Rather "integration" involves planned trial-and-error from observation and testing, sometimes due to the shear complexity of the sum of everything or reasons that are out of your control.

    "Integration" is like doing science or social science; algorithm, math.

    A totally irrelevant wandering thought: I think simplex algorithm is another algorithm of the century, besides FFT.
Re: Algorithms
by demerphq (Chancellor) on Jun 24, 2003 at 22:31 UTC

    and also read Mastering Algorithms with Perl

    Whilst it is a fine book, I personally think it is misnamed. Introduction to Algorithms in Perl would be better. The coverage is so light that I think the term "Mastering" is misused.

    What type of good algorithm you have used for your programs

    Patricia tries. Threaded trees. 2-3 Trees. Binary range search. Evey kind of traversal.

    and how do you come up with good algorithms?

    Id like to take issue with the term "good" here. Good doesnt have any place in a technical discussion. MJD has a wonderful monologue on how emotional terms like "intuitive" and "counterintuitive" (and IMO "good") are poor choices in a discusion of the merits of a technical idea. Basically what is intuitive or good to one person/circumstance is not good in another sense.

    This point leads me to the answer (for me) to your question: By understanding the problem so well that it is possible to select a undeniably superior approach given the constraints and objectives at hand. For instance in many respects program design these days comes down to "get input data, stick data into DB, query the DB in various ways, present results from DB queries in a useful fashion". Its almost reflex to just stuff data into a DB and then expect the DB author to have optimized things so that the complexity involved in whatever we wish to do is handled by the DB. And as a general purpose strategy it isn't that bad. You turn over optimisation and advanced programming issues to the experts (the DB gurus) and in effect become the assembler of parts. The problem is that the solution probably wont be optimal for your target domain (unless it is inherently relational DB related). The DB designers need to design solutions that are generic. For instance the indexing used by a DB probably wouldnt be as optimal (for the particular definition of optimal that suits your circumstance) as a custom solution. Of course it will take a lot more resources (time, money etc) to get the latter right, wheras the former leaves the tricky stuff to other people.

    A few examples of this might be in order. Consider that routines like mergesort, or quicksort are the usual highspeed generic sorting algorithms. Trouble is that if you are sorting fixed length strings (especially with a limited alphabet like telephone numbers or postal codes) these algorithms are not optimal. A properly written radix sort should outperform them in these constrained cases.

    Another example is looking data up in a table. We might use any number of indexing strategies, like B+Tree's or binary trees, or even binary searching a sorted list. OTOH, if we know a lot about our usage profile we may know that most times we do a look up we end up retrieving the same value as last time. In this case an algorithm that involves caching the last looked up value will outperform one that doesnt.

    Or we might take your example of sorting. Perls sort is generic, its intended to be fast for the "average" data sets and cases. It turns out that the version used in some versions of perl has some _really_ bad cases however, and so it has been changed (isn't even quicksort anymore iirc). Anyway, the point is that in Perl its easy to just use sort to sort things. But in some situations it may not be advisable to do so. For instance it wouldnt make sense IMO to use sort if you were only adding one value and putting it in order. I might use binary search with splice, or even a single pass of a bubble sort if the list was small.

    I think the thing with algorithms is that its not just knowing how given algorithms work, but rather the various trade offs presented by one or the other. The art of programming is selecting the right tradeoffs and the algorithms that follow from them. Hashes are a good example. The desire was to have associative arrays. The overriding concern was to make them fast for store and fetch (and create?). Hence the use of hashing and the name. VB for reasons that aren't clear to me chose some other representation (doubly linked list?), suffice it to say that lookup speed wasnt a priority. (On reflection maybe VB's collections are faster to iterate over. If so it would suggest linked lists. Hashes dont seem like they would be the fastest way to iterate IMO.)


    ---
    demerphq

    <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
      Or we might take your example of sorting. Perls sort is generic, its intended to be fast for the "average" data sets and cases. It turns out that the version used in some versions of perl has some _really_ bad cases however, and so it has been changed (isn't even quicksort anymore iirc).

      Well, in 5.8.0, the default is mergesort; however, quicksort is still available on demand. But if you select quicksort, a trick will be performed to make it unlikely to hit one of the cases where quicksort behaves quadratic: large arrays will first be shuffled.

      Anyway, the point is that in Perl its easy to just use sort to sort things. But in some situations it may not be advisable to do so. For instance it wouldnt make sense IMO to use sort if you were only adding one value and putting it in order. I might use binary search with splice, or even a single pass of a bubble sort if the list was small.

      Actually one of the decisions for using the current implementation of mergesort was to be able to deal well with nearly sorted lists. The mergesort in Perl is faster if there are longer runs of sorted, or reversely sorted, elements. So, in 5.8.0, it does make sense to push an element on a sorted array, and then sort that list, instead of using a binary search and splice. Both operations will be linear (!), but the former will do almost everything in C.

      Abigail

        Why doesn't perl use heapsort?
Re: Algorithms
by chunlou (Curate) on Jun 24, 2003 at 21:02 UTC
    Since Quicksort (sort of a most underpraised overworked stepchild) was mentioned in "Top 10 Algorithms" mentioned in the post, there's some cute animation illustrating the algorithm at this site. Very instructive.
Re: Algorithms
by umasuresh (Hermit) on Feb 19, 2011 at 15:56 UTC
    Although I am replying to this post eons later, I would like to share my experience with using an algorithm to speed up a relatively straight forward problem. Often Bioinformaticians face the task of checking if a given position lies within a target region.
    For e.g.
    position target_region is_member 100 95_105 yes 110 120_130 no
    and so on...
    The query file contains unique records and the target file is sorted numerically.
    This seemingly simple task can be a big drag when the query file is large (1 million records) against a target file say (200K). I sought tips from Mastering Algorithm in Perl.
    I wanted to try the quicksort algorithm explained in this book.
    First I divided the target region into two regions and searched if the query position fell in the two halves including the pivot region. Once I found the half in which the query lies, I pass it to a recursion function and each time the target region is split into two-half arrays and searched. This works and I get the expected output. But, the recursion was a big drag.
    I then sub divided the target region into 16 chunks and checked in which of this smaller chunks the query region lies. Then I passed that chunk into the recursion function and this significantly increased the speed. I still wanted to tweak the code further. Originally I has stored the query file into a hash. I changed this to an array and gained more speed. I am still refining the code and hope to tweak it further.
    So, here is my experience with Algorithms and speed. The problem may be trivial. But, I had to process close to 1000 such files and this approach helped me a lot! You can follow the evolution of my code at What is the best approach to check if a position falls within a target range?.