in reply to Re^2: Tokenizing XML
in thread Tokenizing XML

With six dashes it doesn’t occur within comments: six dashes mean one complete (empty) comment (the first 4 dashes) and one open comment that stretches past the angle bracket (the next 2 dashes). 8 dashes in a row would be valid and do what you expect (because they indicate 2 empty comments, both closed). So sequences of 4 dashes do no cause confusion. Neither do sequences of 5 dashes, provided that a 5-dash-sequence is followed by a non-dash character.

Your regex will reject valid comments.

Ok, turns out that I’m applying SGML rules and that they have been simplified for XML. I guess I should have another read over the spec myself, sigh. That regex should work then.

Makeshifts last the longest.