Re: Unique numeric ID for reference?

Replies are listed 'Best First'.
A benchmark by Jeffrey Kegler (Hermit) on Oct 07, 2007 at 00:50 UTC
Actually, to my surprise, simply sticking an ID number in the refered-to object (it's an array) and dereferencing it is fastest. Here are the numbers: Read more... (960 Bytes) `Rate String Refaddr Numeric ID Field String 747332/s -- -18% -24% -28% Refaddr 912330/s 22% -- -8% -13% Numeric 989664/s 32% 8% -- -5% ID Field 1044125/s 40% 14% 6% --` [download] And here's the code that did the Benchmark: Read more... (727 Bytes)	[reply] [d/l] [select]
Re: A benchmark by Juerd (Abbot) on Oct 07, 2007 at 14:05 UTC
`0` doesn't really count as a unique id, does it?	[reply] [d/l]
Re^2: A benchmark by Jeffrey Kegler (Hermit) on Oct 07, 2007 at 15:07 UTC
Since my test data has exactly one sort record, any ID number would be unique. :-) I'm trying to focus on the per-record time for a pre-pass to a sort, so I think a single record database captures those aspects of the problem I'm focused on. I ran more numbers, by the way, looking at what happens if you have to deal with potentially undefined records. It gets complicated depending on whether you can turn off warnings, if you have to explicitly test for undefinedness, whether you need multiple levels of, etc., etc. The numeric solution ($ref+0) and the indirection-to-unique-identifier solution ($ref->[0]) run neck to neck, continually swapping first and second place with every small change in the assumptions. My conclusion is that they're close enough in terms of efficiency that even in time-efficiency driven situations, you can let other factors (readability, space-efficiency, etc.) decide. I've coded it up using your suggestion of forcing the reference to numeric ($ref+0). Like I say, I decided efficiency was a tie, and by using the references as the subkesy I save the extra logic needed to create and track an extra data field. I do wonder why in the refaddr code in non-XS Scalar::Util, the code stringifies the reference then pulls a number out with a regex. As far as I can tell in terms of complexity and time-efficiency, that's clearly inferior to forcing the reference to numeric. jeffrey	[reply]
Re^3: A benchmark by Juerd (Abbot) on Oct 08, 2007 at 12:49 UTC
Re^2: Unique numeric ID for reference? by Jeffrey Kegler (Hermit) on Oct 07, 2007 at 00:10 UTC
Yes, of course. Forcing the ref to numeric does it, and that is my answer. Embarrassingly easy. "But why not store the reference itself?" Not sure what you mean here. For my Guttman-Rosler-Schwartz sort I'm creating a sort key with pack. Given the number you just showed me how to get, I stuff it into a "J" field. How would I "store the reference itself"? And why do I want to? Since the packed keys in a GRS Transform are thrown away once the sort is done, I've no real use for an actual reference -- all I need is a unique numeric cookie. I don't need to dereference from the sort key. Or do I miss your point? jeffrey kegler	[reply]
Re^3: Unique numeric ID for reference? by Juerd (Abbot) on Oct 07, 2007 at 14:04 UTC
Then i wonder why you are sorting by reference address. Is that ever useful? It's kind of randomish.	[reply]
Re^4: Unique numeric ID for reference? by Jeffrey Kegler (Hermit) on Oct 07, 2007 at 15:35 UTC
Sorting with numeric references as a key will gather records with the same reference together. This might be done, for example, to weed out duplicates. Other than that, yes, you're right, the ordering between non-identical references is pseudo-random and not very useful. jeffrey	[reply]