With six dashes it doesn’t occur within comments: six dashes mean one complete (empty) comment (the first 4 dashes) and one open comment that stretches past the angle bracket (the next 2 dashes). 8 dashes in a row would be valid and do what you expect (because they indicate 2 empty comments, both closed). So sequences of 4 dashes do no cause confusion. Neither do sequences of 5 dashes, provided that a 5-dash-sequence is followed by a non-dash character.
Your regex will reject valid comments.
Ok, turns out that I’m applying SGML rules and that they have been simplified for XML. I guess I should have another read over the spec myself, sigh. That regex should work then.
Makeshifts last the longest.
In reply to Re^3: Tokenizing XML
by Aristotle
in thread Tokenizing XML
by Skeeve
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |