in reply to pack unpack charcount repetition

Seeing as all are below 128, you could fit two numbers per byte and save 50% space off the top. the first four bitxs would be $number[0] next four bits would be $number[1] etc. to expand just parse the bits and form the array. if you are looking for a more virsitile compression alg, then check out the sources for gzip or bzip2. If you just need to compress a dataset in your program then check out your options at compress each type performs differently with different datasets. I think bzip2 is still a leader overall.

-Waswas

Replies are listed 'Best First'.
Re: Re: pack unpack charcount repetition
by BrowserUk (Patriarch) on Jun 09, 2003 at 11:44 UTC

    How can you fit 0 .. 128 into 4-bits?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: Re: pack unpack charcount repetition
by meredith (Friar) on Jun 09, 2003 at 12:37 UTC
    Although this has been said, it hasn't been explained:
    The difference between using a field with a max of 128 and a max of 256 is only one bit, not 4. For those not yet versed in binary, that is to say: for 87654321
    bit positionplace value
    11
    22
    34
    48
    516
    632
    764
    8128

    Now, shortening it to 7 bits would give you a 1/8 compression at the expense of not allowing any data that uses the 8th bit. I regard this as a mistake, because it can cause all kinds of problems, re: SMTP, AIX /dev/tty*

    As an aside, this data set will probably be too short to get a decent compression because of different sorts of overhead. Maybe this is a creative use for freeze/thaw??? (that's from Storable).

    Update: 2 Mins later: Ah, I am aware that the OP wanted code reduction, not compression. Well, ignore the last of my ramblings.

    mhoward - at - hattmoward.org
Re: Re: pack unpack charcount repetition
by denthijs (Acolyte) on Jun 09, 2003 at 11:52 UTC
    >Seeing as all are below 128, you could fit two numbers per byte and save 50% space off the top. the first four bitxs would be $number[0] next four bits would be $number[1] etc.
    to expand just parse the bits and form the array
    this sounds like a great plan, ..but
    just parse the bits and form the array,....
    is just sitting there staring at me, could you show me?

      It ain't gonna happen. The maximum value you can store in four bits (a nibble) is decimal 15.

      Unless, of course, waswas-fng has access to a computer that uses a quantum bit that has more states than "off" and "on".

Re: Re: pack unpack charcount repetition
by waswas-fng (Curate) on Jun 09, 2003 at 23:00 UTC
    I am sorry, I posted this at 5 am or so my time and had not had any redbull yet =) here is what i meant to say: If (0,28,48,54,60,62,76,126) = (0000,1000,0100,1100,0010,1010,1110,0001) then you can encode the original array as:
    [0000,1100,1000,1110,0001,0000,1000,1100,1010,1100,0000,1000,1100,0010 +,0100,0000,1100,1100,1010,1100,0000]
    or as bits:
    0000110010001110000100001000110010101100000010001100001001000000110011 +00101011000000
    then convert to 12 bytes (filling the last 4 bits with 1111 to tag padding). to get the data back you need to chomp 4 bits per array slot until done or you hit 1111
    0,54,28,76,126,0,28,54,62,54,0,28,54,60,48,0,54,54,62,54,0
    stored in a file is 58 bytes. As long as the set of unique numbers stays fixed or less than 15 you can save space storing in this way and the larger the array the more savings you will see.

    it was really silly anyways because you will note for such a small set the perl code to compress/decompress + the data compressed is larger than the starting data =)

    -Waswas
      my dataset really was chosen awefully it seems, ..
      still, this could be very handy if i werent to encode yaph(248131) but the full Yet Another Perl Hacker in the future.
      or countless other things, i'm sure ;)
      *less is more* :)