Re: Character Length Requirement & String Conversion
by ikegami (Patriarch) on Mar 13, 2012 at 01:40 UTC
|
You can truncate the result from md5_hex.
But hold on.
A 128 bit number in base 16 uses 32 digits, so chopping all but 19 of those results in a 40% drop in hash size.
A 128 bit number in base 62 uses 22 digits, so chopping all but 19 of those results in a 14% drop in hash size.
If you go the truncation route, don't use base 16, use base 62 (0-9,a-z,A-Z).
By the way, you'd need 107 different symbols to represent an arbitrary 128 bit number using no more than 19 symbols, but there are only 95 printable ASCII characters (counting the space, but not counting tab, line feed, etc).
| [reply] |
|
|
| [reply] |
|
|
Yeah, but again for emphasis, I wouldn't truncate the hex of the MD5 hash. Too lossy for my taste.
| [reply] |
|
|
I wanted to update that I have found the solution I was looking for, using String:CRC32 or String::CRC.
Thank you again to everyone for your comments!
-BP-
| [reply] |
|
|
|
|
Re: Character Length Requirement & String Conversion
by tobyink (Canon) on Mar 13, 2012 at 01:10 UTC
|
When you say "characters" what do you mean? Bytes or unicode code points? Does "Αλφαβετ" count as seven characters or as 14? If we're truly talking about characters, and not just bytes, then defining some sort of base-65536 encoding would probably be feasible, and you might be able to pack 80 or so ASCII characters into a 19 character unicode string. Not an especially attractive solution.
Do you need to be able to reverse the encoding - i.e. expand the encoded string back to the URL? If not, then a digest function such as MD5 should be adequate. Digest::MD5's md5_base64 function is 22 characters. As you are only using it as an identifier, there should be no harm in simply stripping off the final three characters.
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
| [reply] |
|
|
Thanks again for the responses.
Reverse conversion is not a major requirement. We have approximately 20 of these request strings to convert per jmeter test, with approximately 20 tests total.
The unfortunate scenario is that RRDTool can only use 19 characters for a ds name and our nagios implementation is leaning on RRDTool solely.
Rather than reinventing the wheel, this is more of a last ditch approach to assigning a static unique identifier to each HTTP request. I will likely add a command line argument to print out the preconverted strings for debugging purposes.
I suppose it is worth asking if anyone is familiar with a way to assign a unique ID to each request via jmeter, but that really is a rabbithole we probably should avoid... :)
-BP-
| [reply] |
|
|
| [reply] |
Re: Character Length Requirement & String Conversion
by BrowserUk (Patriarch) on Mar 13, 2012 at 00:59 UTC
|
I am looking for something similar to Digest::MD5's md5_hex function.
What is wrong with using md5_hex()? Ignore this! I was thinking of the 16-byte binary representation instead of the 32-byte hex.
When I was looking for something similar, I eventually settled upon the 64-bit FNV1a implementation.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] |
|
|
| [reply] |
|
|
| [reply] |
Re: Character Length Requirement & String Conversion
by locked_user sundialsvc4 (Abbot) on Mar 13, 2012 at 13:27 UTC
|
Do you need for the strings to be readable? (In other words, to quote Mary Poppins, supercali...docious?) If so, maybe you just want to take the first x characters and the last ycharacters and separate them with an elipses. If you find that the name collides with something you already have, add a couple of random digits. (Or, for consistency, just do that all the time.)
If you don’t need them to be readable, a short string of randomly-selected characters can be generated, and a hash used to link the literal string to the random moniker that you have thusly chosen.
If you “know” that some parts of the URLs you’ll be dealing with are never “interesting to anybody,” just omit them entirely.
If there might be some ambiguity in the user’s mind when looking at a particular graph vs. similar ones, consider adding a legend to the graph or in a separate document. It also might be desirable to simply, say, number the elements on your graph, then provide a separate legend in all cases.
If you intend to produce many graphs that you know will be compared side by side and that you also know will contain many “similars,” consider using (say...) an SQLite database file to maintain a consistent mapping table that grows as necessary.
| |