in reply to RE: What's Wrong With Perl?
in thread What's Wrong With Perl?

I sympathize (really!!) with your skepticism about mark-and-sweep GC for Perl.

However, as an internals hacker, I have to say that the final, absolute end of memory leaks is worth a high price. Just think of all the programmer time that can be spent on more useful pursuits than memory bookkeeping!

Besides, with the current RC implementation, you just aren't noticing all the CPU time drained away by all those inc's and dec's and jz's required to keep ref counts more or less in line with reality.

    -- Chip Salzenberg, Free-Floating Agent of Chaos

Replies are listed 'Best First'.
RE: What's Wrong With Reference Counting?
by tye (Sage) on Aug 01, 2000 at 01:30 UTC

    And I like not noticing. Even if my code ran a little faster overall with mark-and-sweep GC, I'd rather have the consistancy of performance provided by RC. Now, if my code ran 10-times faster (or even 3-times faster) with GC, I'd put up with moderate burstiness. But Lisp programs spend 25% to 40% of their time doing GC. Are you really claiming that RC overhead is anywhere near that high?? Or is the nature of Perl that GC in it will be tons more efficient than it is in Lisp?

    Are you implying that it will be easier to implement a correct version of GC than of RC? The research says that RC is easier to implement. Sure, we run into bugs in Perl's RC, especially when we extend a large, old code base in ways not originally envisioned. You propose that we won't run into any bugs in GC??

    A question: Does mark-and-sweep even preserve destruction order? It doesn't sound like it but I've only read high-level descriptions

      I'm only speculating here, but: Yes, I believe that Perl can make more efficient use of GC than Lisp does. You can't swing a dead cat in a Lisp program without allocating and freeing scads of conses. Perl, being written in C or something equally capable of low-level manipulation, can avoid hot spots in GC as they are measured.

      Based on lurking around the gcc mailing lists, I think the reason the conservative GC can *be* conservative is that it's given a relatively small number of "root pointers" that are the only valid sources for reference chains of GC'able objects. If you miss a root pointer, you get memory corruption. But I'd rather do that than count references.

      But as for the GC library itself: If we use the Boehm GC library, which is somewhere between version 5 and version 6... NO, I don't expect that the GC mechanism will have any bugs, at least none worth speaking of. It's been put through the wringer too many times.

      No, destruction order is not maintained. But we've already figured out that we want to separate end-of-scope cleanup per se from object descruction. I wouldn't be surprised to see Perl go the way of Java and not even have any actual descructors. (I hope I'm not misstating how Java works....)

      Actually, the gcc example is a good one. They've converted an existing program from non-GC to GC and they're happy with the results.... How much better off will we be if we design Perl 6 to use GC from day one?

          -- Chip Salzenberg, Free-Floating Agent of Chaos

      As for consistency of runtime ... you've already lost it.

      I use a time-slicing system for programming (Linux). Most programmers do, I suspect, now that Windows and its cousins are also MT (if badly). And then there's virtual memory....

      You can have MT and VM, or consistency. GC doesn't even come into it most of the time.

          -- Chip Salzenberg, Free-Floating Agent of Chaos

      Your idea of mark-and-sweep garbage collection is about twenty years out of date. Modern garbage collectors are not bursty. They run incrementally and the time taken by each pass can be tuned.

        Taking a trip into the memory basement, I came up with the Dr. Dobb's Journal for April 2000, where Joshua W. Burton discusses "various" GC algorithms and implementations. That article also references a DDJ Dec. 1997 article by Mike Spertus. Joshua is/was a software engineer at Geodesic Systems (the makers of Great Circle AFAIK).

        The focus of the article is on incremental garbage collectors with low latency, as latency is the biggest problem faced by GC nowadays. In terms of raw speed, an atomic collector (like "mark and sweep", where in one pass, all unreferenced memory is marked, and in the second pass, all marked memory is released for recycling) will always be the fastest, but on the other side, you have the latency problem, that for example "mark and sweep" will always take a certain amount of time.

        An implementation that suggests low latency is Reference Counting, but even here you (can) have cascades of memory releases, pushing up the latency.

        The article then moves on to describe a coloring collector, which colors memory into three categories (for any given moment) :

        • black - live and fully scanned
        • grey - known live but not yet fully scanned, may contain more pointers
        • white - the objects whose liveness is in doubt
        A collection begins with the root set gray and everything else white. The collection is finished when there is no more grey stuff left. Everything that remains white at the end of collection is garbage and can be released whenever the collector wants. The collector maintains a list of scanned pages and tracks all writes to it (a write to a scanned grey or black page makes rescanning that page necessary). This algorithm can also be interrupted without having to start from the beginning, if you have a mechanism (like CPU-level page locking) to be notified when a write to a scanned page occurs.

        I'll admit that I have no real experience with garbage collection (outside Perl) and little technical knowledge of it. I'm just going by what was referenced in the request itself. One was a 1998 survey that, under "Mark and Sweep" says:

        "As with all tracing schemes, once GC is started, it has to go through all the live objects non-stop in order to complete the marking phase. This could potentially slow down other processes significantly, not to mention having to stop the requestor process completely."

        You must be talking about something other than "mark and sweep" (which is what the suggestion proposed) or your definition of mark and sweep doesn't match that of the survey.

        If this is using concepts that are 20 years out of date, then a link to more appropriate reference material would be appreciated.

      And it's ridiculous to talk about what "lisp programs" do, since there have been hundreds of implementations of list over the last forty-five years, and they all have different garbage collectors. Saying that "lisp programs do this" or "lisp programs do that" is almost as meaningless as saying "computers do this" or "computers do that". You might want to have a look at ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps which is a fairly recent survey of garbage collection techniques, or at some of the other resources at http://www.cs.ukc.ac.uk/people/staff/rej/gc.html

      Assume we are using the Boehm GC library in some version of Perl and we have an object, $parent, that contains a reference to another object $child. For simplicity, assume that eventually, all other references to both objects are removed. When garbage collection hits, is $parent guaranteed to be destroyed before $child?

      Based on the suggestion and the web page that it references, it sounded like this guarantee would not be retained. I hope I've jumped to the wrong conclusion here.

      This along with the potential large delay in destructor firing would make destructors nearly useless which takes away one of the biggest advantages to OO, IMHO.