in reply to Re: Possible to have regexes act on file directly (not in memory)
in thread Possible to have regexes act on file directly (not in memory)

If that turns out to not be the case, then I suppose your best solution will be the one others have mentioned; determine what the largest possible "match" could be, set your chunk size to that size, and read a starter chunk. Then read a second chunk, concatenate them, do a pattern match, discard the first chunk, read a third, concatenate, match, repeat.

I agree with the general approach, but not with the details. There is no reason to choose a chunk size that is equal to the largest possible match, the chunk size can be much larger.

Suppose the max length of a possible match is 10 characters (or bytes, or whatever). You certainly don't want to read your file by chunks of 10 characters. That would be fairly inefficient.

Depending on your system, it might be more efficient to read chunks of, say, 1 MB. The only thing you need to do is to keep the last 10 characters of the previous chunk and to "prepend" it to the next chunk before proceeding. Or, in other words, to append the next MB of data to the last 10 characters of the previous chunk. And run your regex again on that.

  • Comment on Re^2: Possible to have regexes act on file directly (not in memory)