in reply to Re: Re: How do I search this binary file?
in thread How do I search this binary file?

I think I would read the file in one block at a time using the blocksize returned by stat as suggested by Zaxo in a post below (Zaxo++) as long as it was larger than your max chunk size (I can't imagine it wouldn't be.)

I'd use a regular expression to search for the whole chunk. If found, great; process it. If not, I'd start with the last 150 or so (one byte less than your max "chunk" size would do it) and use a four-argument read to append it to the leftover. Then search again... etc. etc.

I don't know how well this approach would do next to some of the other suggestions. It has the advantage of looking for the whole chunk at once and using the regex engine to do it. Presumably that will be pretty quick. It has the disadvantage that you'll be searching through some fraction of the file twice. If you search the whole file, the number of bytes you'd search through twice would be approximately equal to the max chunk size times the size of the file in blocks. Given that, you might be able to improve it by increasing the size of the block you read. If you keep it a multiple of the preferred size it shouldn't hurt anything.

-sauoq
"My two cents aren't worth a dime.";

Replies are listed 'Best First'.
Re: Re: Re: Re: How do I search this binary file?
by John M. Dlugosz (Monsignor) on Aug 21, 2002 at 14:48 UTC
    I can see the benifit of that approach, in that the logic is simple and easy to write correctly.

    It can be further optimized by only overlapping a possible partial match -- that is, if the delimiter is present towards the end, copy that through the end. Otherwise, don't bother. The single re can return a capture for the begin marker and optionally find the remainder.