in reply to Re^2: unpacking 6-bit values
in thread unpacking 6-bit values

It matters if you can optimise the I/O from disk. The drift is that performance should rarely be addressed too specifically.

One world, one people

Replies are listed 'Best First'.
Re^4: unpacking 6-bit values
by BrowserUk (Patriarch) on Dec 11, 2010 at 15:15 UTC
    It matters if you can optimise the I/O from disk.

    Why? They are only read or written from disk once or a few times per run. Runs last many hours, or days. They are decoded many millions of times. The two are unrelated.

    The values are compressed because there are many millions of them, and the 25% space saving is crucial both on disk and in memory. They are kept compressed whilst in memory because otherwise they would need to be paged to disk far more often.

    The drift is that performance should rarely be addressed too specifically.

    Hm. So you'd optimise IO performance but not decode performance?

    This is a crock. I bet you'd be the first to complain if the p5p guys couldn't be bothered to consider the performance of the code they write.

    I know my application's requirements, and with potentially, 2^47 states to explore, performance is crucial.

    If you can't be bothered to make effective use of your users time, money and hardware by writing efficient code; why bother responding to a question clearly aimed at improving performance?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      That really helps. You can do some sort of pre-binding the variables and optimize for this situation. It got me a speed doubling. Feel free to modify to your hearts content ...

      Sorry for the long posts. Here I ripped out the two slowest ones to make room for the real runner :)

      Leads to

      uic: (6 13 22 58 19 15 63 60 26 29 13 17 31 63 14 4 46 12 22 25 9 62 + 32 22 42 14 1 63 48 4 47 11 ... 55 26 57 32 6 47 51 40 26 6 50 37 62 36 60 37 53 8 54 41 32 33 + 18) uicm: (6 13 22 58 19 15 63 60 26 29 13 17 31 63 14 4 46 12 22 25 9 62 + 32 22 42 14 1 63 48 4 47 11 ... 55 26 57 32 6 47 51 40 26 6 50 37 62 36 60 37 53 8 54 41 32 33 + 18) uicb: (6 13 22 58 19 15 63 60 26 29 13 17 31 63 14 4 46 12 22 25 9 62 + 32 22 42 14 1 63 48 4 47 11 ... 55 26 57 32 6 47 51 40 26 6 50 37 62 36 60 37 53 8 54 41 32 33 + 18) Rate mlut uu asu uicm uic uicb mlut 917/s -- -3% -5% -84% -87% -94% uu 949/s 3% -- -2% -83% -86% -94% asu 964/s 5% 2% -- -83% -86% -94% uicm 5747/s 527% 506% 496% -- -18% -62% uic 6998/s 663% 638% 626% 22% -- -54% uicb 15244/s 1562% 1507% 1482% 165% 118% --

      Enjoy, Have FUN! H.Merijn

        That's a neat optimisation, but it gets its gains by pushing some of the required processing out of the benchmark.

        What I mean by that is that the sets of 32 numbers are manipulated in pairs (of sets).

        So, using uicb() I would have to expand one set to the buffer; copy them somewhere else; then expand the second set; before I could then do the manipulations. Ie, the copying would still need to be done, but it is no longer being measured.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      re. "Hm. So you'd optimise IO performance but not decode performance?"
      $pm->{TROLL}++

      One world, one people

        Factuous f***--