in reply to tokenize a string
XML::TokeParser will return tokenized XML. You could write a wrapper around XML::TokeParser if you need the format you listed exactly.
Update: Oops, Sorry, I missed your last paragraph.