in reply to Re^8: Producing a list of offsets efficiently
in thread Producing a list of offsets efficiently

Assigning to $#array is preferrable, I just didn't think of that off of the top of my head.

The two-level index should work. The overhead of accessing it indirectly may lose the benefits of avoiding reallocations.

When it comes to strings, I was thinking something simpler. Use 4 bytes per offset. Pack each offset into those 4 bytes. Sure, you can save more memory, but see if the simple approach is a big enough win. (It certainly should take less code, and makes it easy to access the 432343rd offset - depending on what you do this could be a big win.)

Personally I'd avoid all of these approachs unless I knew that the naive approach had serious problems for my dataset. (Yes, you've indicated why you think that it may for you. This is a reminder for anyone else who might be reading this thread.)

  • Comment on Re^9: Producing a list of offsets efficiently

Replies are listed 'Best First'.
Re^10: Producing a list of offsets efficiently
by BrowserUk (Patriarch) on May 30, 2005 at 08:11 UTC
    Yes, you've indicated why you think that it may for you.

    Not "may". It is.

    The primary intent is to reduce the space required to manipulate the file in memory. The secondary goal is to limit the effects of trading speed for space by doing it as efficiently as Perl allows. There will always be the overhead of tie involved which forms a watermark below which I cannot dip, but within that there is scope for economies.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.