in reply to Munging Streamed Data

You can add a regex for what "unfinished" means. In this case, it would be simply /"/. Then you

/($skip)|($unfinished)|($delim)/gc

along the string. Matching $unfinished means that you need more from the stream.

Note that you also have to be careful with your $skip regex. A typical CSV definition for "quoted" is /"(""|[^"]+)*"/ (update: or even /"(""|[^"]+)*"(?!")/) but, for your module, you'd need to use <update> not /"(""|[^"]+)*"(?!")/ but /"(""|[^"]+)*"(?=[^"])/ (until you reach the end of the stream) or otherwise reject matches that hit the end of your current buffer while there is still data in the stream.

And for some cases, even hitting close to the end of the stream needs to be disallowed. So you probably need a configurable "max bytes before end of stream" value that can default reasonably large (like 4kB) and probably not have to worry about again.</update>.

Update: Note that this approach means that you don't need to remove things that match $skip so you don't need to come up with a replacement "token" that doesn't appear in the data.

- tye