Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^5: Determining uniqueness in a string.

by creamygoodness (Curate)
on Oct 04, 2005 at 05:40 UTC ( [id://497133]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Determining uniqueness in a string.
in thread Determining uniqueness in a string.

Well, if you are untroubled by the prospect of requiring 350 MB of ram, I can certainly see how you would be untroubled by the prospect of EBCDIC incompatibility.

If raw performance is the _only_ criteria, then granted, giganto-hash wins. Unless the script is a one-off, though, it's a poor choice, because, as you note, the algo falls apart if the criteria change even slightly, and most scripts have to be maintained. It demonstrates that hashing is an efficient way to test uniqueness, but that's not exactly shocking news, is it?

There's an awful lot of esoterica in this thread: solutions which don't scale, which are painfully verbose and/or obtuse and/or "clever", which savage the KISS principle, etc. Since this is largely an academic exercise, it's necessary and important to push the boundaries and explore techniques which are wildly unbalanced. And yet the reasoning bears little resemblance to the approach I take when there's code that needs to be optimized.

--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com

Replies are listed 'Best First'.
Re^6: Determining uniqueness in a string.
by BrowserUk (Patriarch) on Oct 04, 2005 at 13:20 UTC

    Hmm. If I was looking for ways to make the problem harder, then I'd consider the possibility that the digits might be Unicode before I considered EBCDIC. There are probably more machines running Unicode right now, today, than have ever used EBCDIC.

    Of course, if you happen to have one of those few EBCDIC machines lying around, it would probably solve the problem more quickly than several hundred of the average Unicode boxes put together, but that's another story.

    One thing you wouldn't have to worry about on the average EBCDIC machine is a trifling 350MB of ram, at which point the big hash becomes the KISS solution. It certainly negates the character set problem.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re^6: Determining uniqueness in a string.
by Perl Mouse (Chaplain) on Oct 04, 2005 at 08:57 UTC

    Well, if you are untroubled by the prospect of requiring 350 MB of ram, I can certainly see how you would be untroubled by the prospect of EBCDIC incompatibility.

    Considering the amount EBCDIC machines out there, if speed really is an issue, I highly doubt the OP wants to run the same code on both an EBCDIC machine and a non-EBCDIC machine. In fact, if speed is an issue, he won't. He'll create one for the EBCDIC machines, and one for the non-EBCDIC machines. And while the average programmer that lives in the ASCII world doesn't know how to convert ASCII specific code to EBCDIC code, my bet is the average EBCDIC programmer does. So, if the OP needs to run the code on an EBCDIC machine, I'm sure he knows how to do the conversion.

    Perl --((8:>*

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://497133]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 06:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found