in reply to Re: pack/unpack binary editing
in thread pack/unpack binary editing

Actually I am doing something a little different. I have to work on bit boundaries and not byte. I may have to throw away a bit here and there.

Replies are listed 'Best First'.
Re^3: pack/unpack binary editing
by BrowserUk (Patriarch) on Feb 08, 2005 at 13:57 UTC

    Hmm. Then you have a problem that will require a little more effort. How will you determine which bits to throw away?

    If you can do it whilst moving through the file in a single direction, or if you can construct a set of "editing instruction" ("delete bit3/byte 700004", "insert '010' at bit6 of byte 3002" etc.), whilst treating the file read-only, then sort those into byte/bit sequence.

    You can then do the editing in a second linear pass through the file. You would keep a running buffer ( 0 - 7 bits ) of any odd bits. Appending those to the front of each buffer as you read it in, make any modification to that chunk of bits and then write int( bits--in-memory/8) bytes back out, retaining the leftover bits. Rince and repeat till done.

    The problem with that is that when you re-order the editing instructions, you will need to acount for any shifts in byte/bit positions in order to account for teh effects of editing that will be done by earlier sequences. Not a hugely onorous task, but one that would need thourough testing on small files before you starting screwing with the big one.

    It really depends on your answer to the question I posed first. How will the sequence of edits be determined. The answer to that will define the best strategy.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      I have a frame of 480 bytes in which I have a sync word of 10 bits long. I need to be able to find the start of my framing which may not occur on a byte boundary. I also have to take into consideration that the file may skew occationally, so I'd have to check frame by frame for my sync word. I've done this successfully using pack and unpack but it takes forever. I would like to slide through the file without having to pack or unpack.
        I am in awe of BrowserUK's (and your) willngness to tackle this as a challenge.

        May I humbly offer a simpler suggestion? Do you have control of the file-generator's output? Can you -- by accepting a somewhat larger datafile -- align your data on 512-byte boundaries with padding? Then, the frame is always recognizable and there's room for an individual frame to grow or shrink a bit.

        How long is your current attempt taking?

        The following code processes a 100MB file (including finding and recordng 25million hits of a 10 bit pattern) in ~ 20 seconds, and a 1GB file (250 million hits) in ~ 3 minutes 20 seconds.

        I make that a round 1/2 hour to process your 9GB. And probably much less as your hits will be less frequent and you can advance the buffer pointer by 480 bytes after each hit.

        It uses a basic sliding buffer to process the file in 1 MB chunks with an overlap of enough bytes to ensure continuity. (You'll need to verify the math of the byte/bit offset calculations).

        Code + some timings


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.