in reply to How do I get a unique Perl Interpreter ID?

As zwon pointed out that mod_perl may create interpreters that do not have a tid, I had another thought.

The address of any of perl's readonly built-in variables that are cloned for each interpreter should be unique to that interpreter.

Here I've used $$:

c:>perl -Mthreads -E"async{ say \$$; sleep 1e6 }->detach for 1 .. 10" SCALAR(0x3c379b0) SCALAR(0x3cbf8f0) SCALAR(0x3d3fa60) SCALAR(0x3dcd7d0) SCALAR(0x6e5b240) SCALAR(0x6f017e0) SCALAR(0x6f68a00) SCALAR(0x6ff5820) SCALAR(0x707a4f0)

At least as long as the previous threads are still running. And if a previous thread terminates and perchance a new thread happens to reuse the exact same address for the same variable in a new interpreter, that probably doesn't matter right?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^2: How do I get a unique Perl Interpreter ID?
by wrog (Friar) on Dec 02, 2011 at 18:31 UTC
    ooooo.

    I'm really liking ikegami's XS code, which is far more simple than I expected, but this looks like a potential winner w.r.t. the original question.

    Still, I'm wondering what it is we're actually getting when we do \$$: some experimenting shows that it is not at a fixed offset from PERL_GET_THX, which suggests to me that there's a separate allocation to create the reference and we're not actually getting the address of the variable itself. (At which point I'd worry about the reference being gc'ed and the address getting reused in another thread. Then again, I suppose one could just make a point of holding onto the reference... hmm... I wonder what pack "p"... will do with this).

    I also think one would want to pick something other than $$ which, being a thread-independent constant, has no reason not to be shared across threads even if the current implementation is not doing that for whatever reason. But it's not like there aren't a whole mess of other things to choose from.

    And if a previous thread terminates and perchance a new thread happens to reuse the exact same address for the same variable in a new interpreter, that probably doesn't matter right?
    This is the same problem as process ID getting reused. I think as long as I've got time() in there we're okay (.. and I believe it's bullet-proof if I put in a sleep(1) between initializing the counter and creating the sub ...hmmm... and now we may have an argument for using microtime...)
      ...hmmm... and now we may have an argument for using microtime...

      If you've moved away from staying pure Perl, for a source of an unguessable counter, I'd use the Time Stamp Counter.

      Given that this changes by anything ranging from 1/2 a million to 10s of millions between successive calls in a tight loop, the odds of collisions even if you have 16 concurrent cores are negligible:

      #! perl -slw use strict; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'END_C', NAME => 'rdtsc', CLEAN_AFTER_BUILD => 0; SV *rdtsc() { return newSVuv( (UV)__rdtsc() ); } END_C my( $last, $this ) = ( 0, 0 ); print( $this = rdtsc(), ' ', $this - $last ), $last = $this for 1 .. +20; __END__ C:\test>rdtsc 95054001389914 95054001389914 95054002276396 886482 95054003052862 776466 95054004698944 1646082 95054006658865 1959921 95054008588537 1929672 95054010420586 1832049 95054012410180 1989594 95054014268572 1858392 95054016253981 1985409 95054018070946 1816965 95054050803874 32732928 95054051382070 578196 95054053061884 1679814 95054054901610 1839726 95054056870252 1968642 95054058682825 1812573 95054060659909 1977084 95054062399501 1739592 95054064304702 1905201

      Indeed, used alone with some suitable modulus operations, it would form the basis of a pretty damn good cryptographic rand() all by itself.

      Even if the bad guys had an identical system -- hardware and software -- it would be impossible to predict the next number coming from it. It is affected by every single thing that happens on the system -- interrupts from your nic; mouse movements; thermal loading; every piece of software running in the systems.

      Even if you put two totally identical systems side by side and synchronised them, I bet they would not stay in step for more than a few milliseconds.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        If you've moved away from staying pure Perl, for a source of an unguessable counter, I'd use the Time Stamp Counter.
        Various things:
        • distinctness is what matters for my application, not unguessability. If I had wanted unguessability I wouldn't be using a counter, period.
          (Admittedly, the discussion has gotten confused in some of these postings, because I said at one point that I also need a random number generator, but that's a separate issue --- except for the matter of how to seed the random number generator differently on each of the interpreters, which does go back to the same problem; ... and yes, a portion of the seed does need to be unpredictable, but we already have /dev/urandom for that, so...).
           
        • The reason I like ikegami's XS code is because the only thing it's doing is taking the address of something where we don't care what the address actually is. This is about as machine-independent as it gets. Never mind it being insanely fast because,... well,... two (2) instructions.

          (... meanwhile, TSC, which I hadn't heard of before (thanks), unfortunately appears to be x86-specific, and perhaps even Intel-x86-specific, if the Wikipedia article is to be trusted...hahahaha).
           
        • and the point of this
          I think as long as I've got time() in there we're okay (.. and I believe it's bullet-proof if I put in a sleep(1) between initializing the counter and creating the sub ...hmmm... and now we may have an argument for using microtime...)
          is that I get what looks like a proof of the impossibility of two counter invocations producing the same result. I figure that beats 4 aces and any amount of futuristic hardware (... meaning we're set for another 10 years at least...).

          Fleshing this out a bit more just to make the idea clear:

          • If an interpreter instance does
            sleep(1.1); $t=time(); sleep(1.1);
            my @value = (0,interpreter_id(),$$,$t);
            sub counter { ++$value[0]; return @value; }
            that first line blocks out a time interval 2.2 seconds long during which every other interpreter instance on the same host other than the one that's sleeping must have either a different interpreter_id or a different $$ (since for something to have the same interpreter_id, which is really a memory address, either it's sharing the same PerlInterpreter structure and hence is same interpreter, or it must be in a different address space, which means it's in a different process, and therefore must have a different $$ because it's in existence concurrently with the process containing the sleeping interpreter).
             
          • Any interpreter that calls time() and obtains the same value $t, must have made the call sometime during that interval, therefore must have existed during that interval, and therefore must be either the same interpreter or have a different (interpreter_id,$$) combination.
             
          • Two distinct counter() calls from same interpreter instance — the only way to have (interpreter_id,$$,$t) be the same, as per the previous two bullets — must differ in the first elements of the lists returned (... assuming, say, that we make a point of killing the interpreter if $value[0] ever gets close to wrapping around...)
             
          Naturally, sleeping for 2.2 seconds is rather wasteful. The argument for using microtime is then the decreased granularity of utime() and hence the need to only sleep for 2.2 microseconds to accomplish the same result — admittedly at the cost of having to keep around another 20 bits of time in the $t part of the list and having to take into account how much the clocks can be skewed across multiple cores. Still insanely cheap in the grand scheme of things.

          And yes, I know proofs are only as good the axioms; it's very easy to make stupid assumptions in this realm, and over the years I've seen lots of these fall over...

          On the other hand, this is the best I've seen in a while, so I think I'm going to run with it.