Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: Data structures in Perl. A C programmer's perspective.

by code-ninja (Scribe)
on Sep 07, 2013 at 03:47 UTC ( [id://1052795]=note: print w/replies, xml ) Need Help??


in reply to Re: Data structures in Perl. A C programmer's perspective. (vector vs linked list performance)
in thread Data structures in Perl. A C programmer's perspective.

I've always been told that linked lists are faster (in algorithms course in college). I never really understood why people say that when arrays give indexed access to its elements.I mean its cool that linked lists gives dynamic memory allocation and all but the trade-off is too high. As davido points out, everything will require O(n) time but arrays can do some stuff in O(1) time (fetch an element at an index). Trees and other DSs maybe useful (Perl provides so many already) but linked lists are just a learning exercise for a coding craved mind.

PS: RAM is not a random-access memory device! I'll need something in written to digest that! :P

Replies are listed 'Best First'.
Re^3: Data structures in Perl. A C programmer's perspective.
by davido (Cardinal) on Sep 07, 2013 at 04:56 UTC

    My post was really about order of growth for various tasks given two different data structures.

    An array stores its data linearly, in contiguous memory. If you want to find the "nth" element, you go straight to it (it's a simple mathematical computation, internally). But if you want to insert between "n" and "n+1" you must shift everything from n+1 through the end of the array out one spot, and hopefully not have to copy the entire array to a larger block of contiguous memory.

    A linked list stores its data all over the place. Each element just points to the location of the next one. Finding the 'nth' element means starting at the top and counting until you get there. But once you're there, inserting between n and n+1 is simply a matter of changing where a couple pointers point.

    So an array has O(1) lookups, and O(n) insertions and deletions. A linked list has O(n) lookups, and O(1) insertions/deletions. But big-O is concerned with orders of growth. There are many other factors that don't get considered when looking at orders of growth of computational complexity.

    But I also mentioned that Perl's arrays, being implemented at the "guts" level, are very fast. And most linked list implementations in Perl are implemented in Perl code, which is comparatively slow.

    Add to that the things that eyepopslikeamosquito mentioned, and it makes it very hard for linked lists to be the data structure of choice. ...even when Perl's arrays have so much behind the scenes overhead going on, they're still quite hard to beat.

    That doesn't mean that we have to throw out the window everything we learned in comp sci. It just means that technology has found ways to bend the rules-of-thumb. A binary tree is still an excellent data structure for certain uses. Linked lists can still make sense for some tasks. A trie is hard to beat for certain types of lookups. I guess what we have to keep in mind is that now more than ever one must fully understand the nature of his data, the usage patterns expected, and the tricks the hardware and compilers can play.

    I don't know too much about the guts-level implementation of Perl's hashes, but it's my understanding that the implementation hashes the key to derive a bucket, and that when there's a collision (more than one item in a bucket), the buckets' contents are stored in linked lists.


    Dave

Re^3: Data structures in Perl. A C programmer's perspective.
by eyepopslikeamosquito (Archbishop) on Sep 07, 2013 at 04:13 UTC

    RAM is not a random-access memory device! I'll need something in written to digest that!
    The (still relevant) and classic paper What Every Programmer Should Know About Memory (pdf) by Ulrich Drepper should help you digest how memory really works on modern hardware. Or maybe it will give you indigestion. :) The summary is that cache misses on modern CPUs are mind-bogglingly expensive. Data locality is crucial. Update: Prefetching can help - see, for example, the "Prefetching" section at The 10**21 Problem (Part 3).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1052795]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-03-28 12:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found