I am afraid gav^ is right: you can't use XML::Parser, or any XML module for what matters, if your data is not well-formed XML.
Which gives you 3 choices:
- use a custom parser, that deals with the data you actually have, just do not call it XML, believe me it will save you tons of problems in the long run, when you want to use the parser on real XML,
- use a 2-step process: turn your data into valid XML, either using CDATA sections or by replacing < and & in the content of "elements" that contain HTML. BTW you also need to convert the charaters to UTF-8, or maybe to add an encoding declaration, my previous code works by pure accident, if you add a comma after the é the XML parser will complain (loudly!), then, and then only, you can use XML tools,
- a variant would be to use a custom parser that generates SAX events, then you can use XML SAX tools to process the data, and if you need to use real XML (or CSV, or any other format for which a SAX parser exists) you can just use the appropriate parser. Note that in order to generate SAX events you still need to escape & and < and probably to pass properly encoded strings to the SAX processor. Kip Hampton wrote an excellent column about this on xml.com: Writing SAX Drivers for Non-XML Data .