I'm parsing XML with the
XML::Parser module.
The thing works fine, except when it encounters an ampersand in non-markup data (anything that's not a tag):
<writeup node_id="980117" reputation="0" createtime="2001-03-12 17:27:
+54">M&M McFlurry (thing)</writeup>
According to
XML::Parser, this is not well-formed XML data, because of the & in M&M.
Is there any way I can get around this? Perhaps change ampersands to their HTML entity equivalent (&) in my character event handler?
Here's my code:
sub parseUserSearchXML {
my $XMLParser = new XML::Parser(Handlers => {Start => \&startHandl
+er, End => \&endHandler, Char => \&charHandler});
my $node;
$XMLParser->parsefile($filename);
}
# event handler for XML::Parser - start tag event
sub startHandler {
my ($expat, $tag, %attributes) = @_;
$buffer = '';
unless($tag =~ /$tags_to_ignore/o) {
%temp = %attributes;
}
}
# event handler for XML::Parser - non-markup event
sub charHandler {
my ($expat, $string) = @_;
$buffer .= $string;
}
# event handler for XML::Parser - end tag event
sub endHandler {
my ($expat, $tag) = @_;
unless($tag =~ /$tags_to_ignore/o) {
$buffer =~ s/ \($crap_to_remove\)$//o; # st
+rip (person) (place) (thing) or (idea)
$nodes{$buffer} = {%temp};
}
$buffer = '';
}
---
donfreenut
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.