The program I'm discussing was "prototyped" in Perl to transform a native file into XML. Now that feature is being integrated into the main program, which is written in C++. I don't need general parsing/matching features, just a way to tell whether a string is a legal XML identifier.
Too bad specifications like that don't list Unicode glyph database properties, rather than all the legal characters individually!
Doing a good job of that is low priority, but interesting to me.
I think last time I looked at PCRE (if it's the same library I saw before), it didn't handle Unicode. The one you point to mentions screwed-up experimental UTF-8 features, so maybe it's evolved.
Thanks, as always.
—John | [reply] |