in reply to Re^2: Completeness of Devel::Size answers
in thread Completeness of Devel::Size answers

You can detect circles without all the bookkeeping overhead,

Interesting. How?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."
  • Comment on Re^3: Completeness of Devel::Size answers

Replies are listed 'Best First'.
Re^4: Completeness of Devel::Size answers (DAG)
by tye (Sage) on Sep 23, 2008 at 05:49 UTC

    To detect cycles, you need only keep a hash as large as your search is deep. That even allows you to detect cycles "immediately", but it doesn't tell you when you have the same structure referenced from two different places in your directed, acyclic graph: my @a= ( \%huge, \%huge );

    - tye        

Re^4: Completeness of Devel::Size answers
by gone2015 (Deacon) on Sep 23, 2008 at 10:35 UTC

    FWIW, I was stunned when I came across the following:

    To detect cycles you can use two pointers -- say, p and q. You step p through the data in the usual way. You step q through the data twice for each step of p, until the end is met. If p equals q before the end, you have a loop.

    The tricky bit in a recursive descent tree walk is to walk two pointers at once ! You need to build an iterator for q which contains at least a stack for recursion, and, for Perl, some way of iterating through a hash which isn't going to trip up p. (Of course, you have to guarantee that p and q will visit the nodes in the same order, especially where there is a loop !)

    Apart from that minor difficulty, this trick only tells you there is a cycle. It doesn't tell you where it is.

    "It's a good sort of brake. But it hasn't worked yet."

      That is interesting. Horribly inefficient, but interesting :)

      Also, as noted, the tracking hash in Devel::Size doesn't only detect cycles, but also prevents data structures and subtrees that are referenced from 2 (or more) places in the overall structure from being counted multiple times. This isn't just required to detect repetition in the structure at the user level, but also in the internal structures. For example, internally, some hashes can share keys, and the same hash mechanism is used to prevent the memory for shared keys being counted over.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      This works, but it's not very efficient in terms of CPU.

      This technique is used in software verification tools that do depth-first search on transition graphs. Often these graphs are much too large to fit into memory, so they are lazily generated and discarded of, so schemes like this are the only rescue.

      Indeed. Most probably OT, but this reminded me of the appendix Secrets of Programmer Jobs Interviews from a book I enjoyed reading a couple of years ago: Expert C Programming - Deep C Secrets (with a fish on the cover ;-) by Peter van der Linden (1994) (Update: search http://books.google.com/).

      The task is to detect a loop in a linear list. First you are allowed to sketch any algorithm you like. Then more and more restrictions are imposed like the list is read-only, RAM is too limited, must use CPU registers only, etc.

Re^4: Completeness of Devel::Size answers
by tilly (Archbishop) on Sep 23, 2008 at 16:00 UTC
    If you have a circle, then any value in that circle will repeat infinitely often. So at depths 2, 4, 8, and so on you switch to looking for the current element at deeper depths. When you back up, you stop looking for a match. At any point in time you only need to remember your current depth and possibly the one element that you're looking for duplicates of.

    This is guaranteed to find any circular references. It just won't always find them promptly, and so in a total size count you might count elements more than once.