Learning Fundamentals: Data Structures and Algorithm Analysis in Perl

Updated: Correcting my vernacular misuse of terms, incorporated useful links from moritz, metaperl, in replies.

I've seen a few questions recently on SOPW that mention algorithmic complexity (as Big O(*) notation) (see Wikipedia Big O Notation, Big-O Notation - What is it good for? and An informal introduction to O(N) notation.

While these original question and replies seem to have some basic knowledge, I personally find wisdom somewhat missing or am merely confused by the O(*) notation and analysis of cases.

Sources of Wisdom

Mark Allen Weiss (http://users.cs.fiu.edu/~weiss/) has authored a series of books entitled Data Structures and Algorithm Analysis. There are versions for Ada, Pascal, Java and C++. Many universities use these books as college textbooks, and I myself have saved Data Structures and Algorithm Analysis in PASCAL and use it to this day even though I only code in Perl.

As one would expect, Wikipedia has an article entitled Computational Complexity Theory that is a more general discussion of this subject than the specific case of O(*)

Basics of Expressing Complexity Measurements

Complexity cannot be expressed as merely an 'algorithms' complexity. One must be clear if one is talking about:

an average complexity based on mathematical proof
an average complexity based on measurements, whether exhaustive (all cases) or a sample of cases
a best-case complexity
a worse-case complexity
etc...

Therefore, O(*) being tossed out as a bare concept is vaneer glued particle board... it looks beautiful, but is unsuitable for a sturdy boat or a roof. O(*) must not be a veneer of "complexity" tossed over dull and simple questions. It is a formalization of specific 'worst-case' conditions of growth.

Initial conditions and bounds are quite important when discussing complexity. Some algorithms behave differently when tuned differently, or when conditions change. For some discussion (also mentioning Weiss's work) you may refer to Wikipedia's article on Shell Sort which also includes an example of Wikipedia's Algorithm Infobox which could provide a model for including 'facts' when discussing O(*) issues here on PM.

Criticism of "Wise Sounding" RTFM questions

While in no way elitist toward fellow Monks, I personally wish to see the Quality/Node ratio here on PM continue to rise. Therefore, I have to rap a few shoulders with a cane to wake from slumber real wisdom. We must always remain vigilent lest "number of writeups" become the Kismet of XP, rather than XP reflecting the wise output of Monks who employ Zazen (meditation).

Questions to Ponder

Therefore, I put these questions to my Fellow Monks.

What sources of Perl documentation are the best references for Algorithmic Complexity questions?
Are there any good PM nodes discussing the fundamentals of algorithmic analysis? Two good tutorials
Might we need a good Tutorial or Quest written entitled "Data Structures and Algorithm Analysis in Perl" to point drifting monks toward?

Call to Action

Perhaps we need a more coherent effort to address Complexity Analysis to coagulate our vast knowledge toward a more elightening end.

Offer your Answers, fellow monks. However, should no node be found that can bring Satori, what monks would be willing to join me in Sesshin on this subject?

-- Patrick

"Upgrade your gray matter, cause one day it may matter" -- DELTRON 3030

Comment on Learning Fundamentals: Data Structures and Algorithm Analysis in Perl

Replies are listed 'Best First'.
Re: Learning Fundamentals: Data Structures and Algorithm Analysis in Perl by moritz (Cardinal) on Apr 28, 2009 at 15:27 UTC
See the section Other Topics in Tutorials, it has two Tutorials on asymptotic complexity. There are also various nodes that discuss the complexity of specific problems or built-in functions, like today's What is the most efficient way to see if a hash is empty?.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Learning Fundamentals: Data Structures and Algorithm Analysis in Perl by ELISHEVA (Prior) on Apr 29, 2009 at 08:47 UTC
In your list of uses of big O I think you left out one of the most common uses: a thumb-in-the-air order of magnitude. The idea is to put problems in the right ballpark, not to "prove" a particular level of complexity. In practical coding situations, sometimes all we need is a ball park. In fact, precision can sometimes even confuse the issue. Take, for example, the post you dislike so much. I don't know that it really matters to that discussion whether something is O(log N), O(N) or even O(N!). What does matter is that a preferred test for emptiness happen in constant time rather than scale with the size of the data structure. A brief statement like "O(1) vs. O(N)" gets that point across much more clearly than a precise statement loaded with if, ands, and buts. For the most part, the monks I see using big O do in fact know what it means and are aware of all or most of the issues involved in it. However, figuring out how to apply the theory to a particular problem is the challenge. For example, constant factors can make the f(N) component of an algorithm essentially irrelevant. Variance and competition over system resources can make singleton runs of benchmarks very misleading. Both of those turned out to be issues in the thread you did not want to mention, and I think we all learned something from that concrete example. In fact, I think we learned more that way than we would have from a theoretical discussion. Often times, when faced with a complex problem, posters will immediately leap to the conclusion that the problem is NP complete. Often this has to do with the failure to understand the mathematical logic of the problem. A notable example is magic squares where many posters tried a brute force solution over all possible 3x3 squares. Using simple algebra developed during the course of the thread it soon became clear that the problem actually required examination of a much smaller problem space. The mathematics needed to come up with an efficient algorithm for that problem was particular to the problem. Those are but two examples, but I could point to many more. In general, I think it is more constructive to discuss matters like complexity in the context of a specific threads and specific problems. As for being afraid that people would take your criticisms personally? I think that is only going to happen if you jump to conclusions about their general level of knowledge based on the way they handle a specific problem. No one likes to have generalizations made from specific mistakes. It isn't fair. People make mistakes. They overlook things. They speak in generalizations (or too much detail) when they shouldn't. On the other hand, Monks who get insulted by challenges to their problem solving approach usually get dinged - they either grow up or go away. The regulars have been challenged more times than they can count - they aren't going to go up in flames because of yet another one. The monkly way is to criticize a particular approach to a problem, rather than the person or their general knowledge. If you think there are theoretical concepts that may be misapplied or misunderstood, then feel free to discuss the concepts as they apply to the problem or provide links to general purpose treatments. If you do think there is an overall weakness, a mediation might be a good choice, but I think it is probably better to do your homework first rather than just point out a problem and say "let's discuss". In my company whenever someone does that we say: mind your QRS - Query, Research, Suggest. Query means that most problems have a context and a background - any solution one proposes will go over much better if it is stated in a way that takes that into account. Or if that is not possible says, "maybe I'm missing something as a new/infrequent observer. I wonder why X seems so." Research means don't ask others to do one's homework. Find out what has been done before (e.g. other tutorials and meditations). Find out what resources are out there to address the problem. Suggest means come up with a list of recommendations along with their pro's and con's. In the PM context that means a thoughtful, well researched meditation that adds new knowledge or synthesizes existing resources in a creative way. Best, beth	[reply]
Re^2: Learning Fundamentals: Data Structures and Algorithm Analysis in Perl by tilly (Archbishop) on May 01, 2009 at 18:54 UTC
Very mild disagreement here with For the most part, the monks I see using big O do in fact know what it means and are aware of all or most of the issues involved in it. I would say that most of the monks I see using it are aware of how it is used informally by hackers, but a very large fraction are unaware of the formal definition, and/or are unaware of several subtleties that come up fairly often. (I rather strongly suspect that most who are aware that the formal definition probably couldn't come up with a reasonable version of it on demand...) For instance if someone says, "That's O(1), not O(n)" they are really thinking of big-Omega, not big-O. Because by the formal definition, anything that is O(1) is also O(n). However that's not how hackers informally use the term. And lots of them don't know that their informal usage is technically incorrect. Furthermore hackers usually have an implicit "amortized average time, if things work like they are supposed to" thrown in that they are not even conscious of. Worse yet, if they were asked to really explain what they meant, most probably couldn't. For instance most of us accept that, "hash lookup is O(1)." Well actually that's the average case if your hashing algorithm is working properly. The worst case with most hash implementations is O(n). If your hashing algorithm isn't a good fit, you'll hit that worst case fairly often. And what about that "amortized" bit? Well if you accept that pushing an element is O(1), you've implicitly accepted the amortized bit. If you're doing Perl programming that's OK. If you're doing real-time programming however... Then of course there is the glaring issue that scalability is often not what matters. For instance which is faster? Scanning a sorted list for an element, or doing a binary search? Despite O(n) vs O(log(n)), in a low-level language on modern CPUs, scanning is probably faster. Very few of us have a good sense for this kind of detail. As you see there are a lot of issues and complications that come up in a more formal discussion. My suspicion is that a large fraction (and possibility the majority) of Perl hackers who throw around the term aren't really aware of those. They could learn it pretty easily, but don't know it off of the top of their heads. However I'd absolutely agree that their understanding is more than sufficient for a practical analysis of the kinds of problems that we programmers are usually likely to encounter.	[reply]
Re^3: Learning Fundamentals: Data Structures and Algorithm Analysis in Perl by BrowserUk (Patriarch) on May 01, 2009 at 21:33 UTC
but a very large fraction are unaware of the formal definition I think far too much is made about "formal definitions". Back when I did my formal CS education, I chose a multi-part long questionon big-O as my optional question. I passed the exam with distinction. Of course they don't tell you anything more than the overall scores, but I think I would probably have had to have scored perfectly on the four compulsuries to achieve that if I'd totally flunked the optional. Back then I could have, and probably did, give a formal definition of big-O and big Omega. But over a 25 year career, I never once needed to repeat it. And nor has it ever been useful to me. But the education means that I do have a way of informally discussing algortihmic complexity in terms that many other programmers recognise and understand. So in that sense the knowledge of it is useful. But in the real-world, when, as you point out, an O(n^2) algorithm in one language can out perform an O(n log n) in another; when memory allocation/reallocation costs can render a formally faster algorithm slower than a formally slower algorithm, because the latter is more cache friendly (eg. Mergesort versus QuickSort); and given multi-tasking OSs and mutli-core hardware renders static formal analysis iffy at best and an outright lie at worst; the usefulness of big-O is pretty much limited to discussion purposes only. A benchmark run many times under real-world or realistic conditions renders it mostly redundant. A starting point, and a means of informal discussion, but little more. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^4: Learning Fundamentals: Data Structures and Algorithm Analysis in Perl by tilly (Archbishop) on May 01, 2009 at 21:54 UTC
Re^5: Learning Fundamentals: Data Structures and Algorithm Analysis in Perl by BrowserUk (Patriarch) on May 02, 2009 at 05:46 UTC
Re^2: Learning Fundamentals: Data Structures and Algorithm Analysis in Perl by spectre9 (Beadle) on Apr 29, 2009 at 22:45 UTC
I completely agree that O(1) vs. O(N) does get a point across about the Hash emptiness being finite time. Where I was led astray was why O(*) was even mentioned versus ending said post after the first sentence. Quite usefule reply... and therefore, but none of them, nor the original post methinks, could be considered 'worst case.' Cheers, Patrick "Strictly speaking, there are no enlightened people, there is only enlightened activity." -- Shunryu Suzuki	[reply]
Re: Learning Fundamentals: Data Structures and Algorithm Analysis in Perl by metaperl (Curate) on Apr 28, 2009 at 15:17 UTC
How about Mastering Algorithms with Perl	[reply]


Perl: the Markov chain saw
	PerlMonks