in reply to method of ID'ing

Time is an illusion. Lunchtime doubly So.
- Douglas Adams

Parham,

I used a similar method on a network daemon and it worked very well until one day a "backup" time server was put into place that was not set to the right time. All of the nodes running the daemon migrated to this new time server because it more correctly matched their (PST time, not PDT time) and next thing you know, IDs are getting re-used and all hell breaks loose.

After examining time sync protocols, I also think there may be some error margin at startup, where the time fluctuates up and down as it adjusts to match time server. I could be wrong here.

So, my notes about unique IDs are as follows:

* If you plan to use time(), use Time::HiRes instead. It provides more uniqueness and also seems to execute faster than time().

* If you hash (ie., MD5), I would use SHA1 instead and remember to add buffer. I really see little reason to hash unless you prefer the string format of a hash. I avoid hashing when it isn't necessary due to the calculation time involved.

* Add an internal increment...sorry, only way I could figure out how to deal with time "slipping". After $inc == MAXINC, reset so you don't get absurdly long numbers over time. Store the $inc to a file if you need to maintain persistence or allow other instances to grab it. Load $inc; $inc++; Save $inc. Remember to flock.

* If you want to make it survive distributed systems, (load balanced or whatever), attach a hostname, IP, or Mac Address. Mac Address will protect you from "admin" mistakes.

* Random is an ok thing to add to your string, but you shouldn't need it and since it is only "somewhat" random, doesn't help much more than time+PID+inc.

* And/or if you really want to make sure nothing "bad" happens, store the ID and do a check. A quick way to do this is to make a file in /tmp or similar purpose area and do something like:

do { [ generate ID code ] } while (-e $ID) [ create empty /tmp/$ID file ]
Of course, this gets slow after thousands of IDs have been generated, so be sure to clean house in some fashion as well.

Replies are listed 'Best First'.
Re: Re: method of ID'ing
by Juerd (Abbot) on Apr 15, 2002 at 07:36 UTC

    * If you plan to use time(), use Time::HiRes instead. It provides more uniqueness and also seems to execute faster than time().

    Time::HiRes::time indeed provides more uniqueness, but it is not faster:

    Benchmark: running Time::HiRes::time, time, each for at least 1 CPU se +conds... Time::HiRes::time: 2 wallclock secs ( 0.82 usr + 0.22 sys = 1.04 CP +U) @ 1071260.58/s (n=1114111) time: 0 wallclock secs ( 0.70 usr + 0.31 sys = 1.01 CPU) @ 18 +16838.61/s (n=1835007) Rate Time::HiRes::time time Time::HiRes::time 1071261/s -- -41% time 1816839/s 70% --

    sorry, only way I could figure out how to deal with time "slipping". After $inc == MAXINC

    Try the modulo operator %. Example increments:

    ($counter += 1) %= 5; # 0, 1, 2, 3, 4, 0, 1, 2..4, 0..4, 0..4, ... ($counter += 1) %= 256; # 0..255, 0..255, ...

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.
    

      Actually, you and I are both correct. ;-) What I forgot was that the time I benchmarked it, it was under NT. I just did a benchmark on NT and Linux and got the following results:

      NT:
      Time::HiRes::time() - timethis 600000: 20 wallclock secs (19.99 usr + 0.00 sys= 19.99 CPU) @ 30015.01/s (n=600000)
      time() - timethis 600000: 67 wallclock secs (66.73 usr + 0.00 sys = 66.73 CPU) @ 8991.46/s (n=600000)

      Linux:
      Time::HiRes::time() - timethis 600000: 3 wallclock secs ( 1.22 usr + 0.24 sys = 1.46 CPU)
      time() - timethis 600000: 1 wallclock secs ( 0.27 usr + 0.17 sys = 0.44 CPU)

      Anyway, enough of the thread hijacking.

      The modulo operator is a great idea, good suggestion.