Marsel has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

In a perl script, i'd like to do the following thing :

I have a list of words, i want to transform these words into theirs binary value in ascii, then add them, and then take the last 2 bytes. In C it would have been quite simple, but it's in a perl script...

the goal is to tag this list by 2 bytes, and then a similar list, even if words are sampled, will give the same ID.

My problem is that when dealing with things like that, i'm always a bit lost in perl. Would someone have an idea ?


Thanks


Marcel

Replies are listed 'Best First'.
Re: ascii to binary
by ikegami (Patriarch) on Dec 04, 2006 at 16:02 UTC
    use List::Util qw( sum ); my $id = (sum map ord, map /./sg, $word) % 65536;

    Ref: ord, List::Util

    Update: The above will underflow on long strings and overflow on longer strings. Fix:

    my $id = 0; foreach (map ord, map /./sg, $word) { $id = ($id + $_) % 65536; }

    Update: What you are doing is called hashing. The following uses a better hashing function. Of course, that means it'll return a different number than yours.

    use Digest::MD5 qw( md5 ); my $id = unpack('n', substr(md5($word), -2));

    Ref: Digest::MD5, unpack

      thanks a lot !

      to answer the previous reply, my words are like :

      - 1237_at, 23493_s_at, ...
      - gsm12832, gsm23948, ...
      - or just float values.


      the idea was to derive a hash (thanks for respelling what i wanted to do !) of these 3 lists, and then have an ID like EA3D4B which would be unique, and independant of the order in which the words were given in each list.

      I think this will work perfectly, thanks again.

      marcel

        Note that an MD5 sum is not independent of the order of the words. You could however split, sort, then join to get the words in the same order each time.

        Your first technique would more properly be called a checksum and is independent of character order.

        Note too that 16 bits does not produce a very unique result compared with an MD5 hash which uses 128 bits. Depending on how many strings you are working with, there may be a fairly high chance that you will get identical checksums for different strings using only 16 bits.


        DWIM is Perl's answer to Gödel
Re: ascii to binary
by davorg (Chancellor) on Dec 04, 2006 at 15:55 UTC

    Can you give an example of the data and the calculations that you would go through.

    Sounds like the ord function might be useful to you.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: ascii to binary
by Old_Gray_Bear (Bishop) on Dec 04, 2006 at 17:32 UTC
    You might want to consider using of the the MD5 modules (MD5 or MD5 Digest). There will be fewer corner cases to bite you.

    ----
    I Go Back to Sleep, Now.

    OGB

Re: ascii to binary
by swampyankee (Parson) on Dec 04, 2006 at 17:25 UTC

    You need to tell us more. Do you mean something like (pseudo-code follows)

    my @char = split(//,$word); my $value = 0; my $place = 0; foreach revers(@char){ $value += ord($_) * 256 ** $place; $place++; } $value = $value % (256**2);
    Clearly, there could be some problems here if the word is longer than 4 characters.

    Or you could try pack and unpack and, of course, Super Search.

    emc

    At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

    —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.