in reply to Re^5: Regex Parsing Style
in thread Regex Parsing Style

I removed chr very soon after posting. Also, I was wondering why you had used \Z instead of \z. Frankly, though I know they're considered the modern anchors to use, I still find $ more immediately recognizeable and less confusing than \Z and \z.

The set of graphic characters that must be escaped is exactly { '"', '\', '^' }. The caret is the oddball. I think it's a carryover from another, different context in which control codes can be specified as two characters; e.g., ^Z. Though such control code sequences never occur in the text I'm lexing (they're represented instead as UCNs; e.g., \u001a), all literals carets in the text are nonetheless escaped (needlessly).

Thanks again.

Replies are listed 'Best First'.
Re^7: Regex Parsing Style
by ikegami (Patriarch) on Nov 26, 2010 at 17:15 UTC

    The set of graphic characters that must be escaped is exactly { '"', '\', '^' }.

    It's funny that you emphasised "must" because that's exactly the word that makes that sentence irrelevant. At issue is what set can be escaped.

    Either way, a fix is needed. The set must be expanded, or an error message needs to be added.

      In the input, the set of literal character that are always escaped is { '"', '\', '^' }; all other literal characters in the input are never escaped.

      How's that?

      I modified the error message of the event that can never happen.