in reply to Re^4: Regex Parsing Style
in thread Regex Parsing Style

/\Z/ should be /\z/ (my fault), and you shouldn't have kept the "chr".

/\\(["^\\])/ looks buggy. Are you sure those are the only symbols that can be escaped? If it's not buggy, you'll need to adjust the error message since no case will handle '\#', for example.

Replies are listed 'Best First'.
Re^6: Regex Parsing Style
by Jim (Curate) on Nov 26, 2010 at 15:57 UTC

    I removed chr very soon after posting. Also, I was wondering why you had used \Z instead of \z. Frankly, though I know they're considered the modern anchors to use, I still find $ more immediately recognizeable and less confusing than \Z and \z.

    The set of graphic characters that must be escaped is exactly { '"', '\', '^' }. The caret is the oddball. I think it's a carryover from another, different context in which control codes can be specified as two characters; e.g., ^Z. Though such control code sequences never occur in the text I'm lexing (they're represented instead as UCNs; e.g., \u001a), all literals carets in the text are nonetheless escaped (needlessly).

    Thanks again.

      The set of graphic characters that must be escaped is exactly { '"', '\', '^' }.

      It's funny that you emphasised "must" because that's exactly the word that makes that sentence irrelevant. At issue is what set can be escaped.

      Either way, a fix is needed. The set must be expanded, or an error message needs to be added.

        In the input, the set of literal character that are always escaped is { '"', '\', '^' }; all other literal characters in the input are never escaped.

        How's that?

        I modified the error message of the event that can never happen.