in reply to How are regex character classes implemented?

For Unicode (and a 64K bitmap's insufficient, there's more than 64K characters at this point) you can use a two or three level tree and compress away entire unused branches. There's some discussion of this in the Unicode spec, which is available online from The Unicode Consortium

Or go dig up PCRE, or IBM's ICU library.

  • Comment on Re: How are regex character classes implemented?

Replies are listed 'Best First'.
Re: Re: How are regex character classes implemented?
by John M. Dlugosz (Monsignor) on Jul 18, 2002 at 20:05 UTC
    The Unicode Technical Report #18 discusses things a Unicode regex engine needs to do, but doesn't give any ideas on how to implement one. Do you know where it discusses the tree approach?

      It's in the main Unicode book itself, though I'd have to go dig through the docs. It's not in the discussion of regular expressions--IIRC it's in there in the discussion of character properties, but it's been a while.