Well, the end goal is converting to XML. Regular expressions for the conversion aren't going to work very well, as the tags aren't properly nested as they are in XML/HTML/SGML. For example <a><b></a></b> is considered valid.
I do think the speed problem is in the tokenizer. I am doing to the scan one character at a time (from a buffer in memory, at least). I'm not sure how I could do that with regular expressions, but it's a good idea to look into.
In reply to Re^2: Parsing SGML-ish Data Files
by coolmichael
in thread Parsing SGML-ish Data Files
by coolmichael
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |