in reply to Can I (with XS) invoke the regex engine without making copies of the buffer?

The regex engine works on the assumption that for a successful match it will save a copy of the string and record the character offsets where $1 etc start and end. If you then try to access $1 et al, it behaves a bit like a tied variable and sets its value to that substring of the saved string.

If there is no saved string, then the regex engine isn't going to allow captures - because extracting a substring of the original string which the regex was run against, could return random garbage or even SEGV if the original string had been modified or freed in the meantime.

The regex engine in newer perls tries to do a copy-on-write of the original string, which means that the copy and the original share the same string buffer unless/until the original string is modifed or freed. Then the copy would take full ownership of the buffer.

But trying to do COW in a guaranteed secure manner would be hard to do.

In short, Perl's regex engine isn't designed to handle this scenario, and it would be hard to be confident that the string is never leaked.

Dave.

  • Comment on Re: Can I (with XS) invoke the regex engine without making copies of the buffer?