For those interested, it can't handle
- Numerical entities (decimal and hex).*
- External entities (e.g. HTML's é).*
- Character decoding.**
- UTF-16, UTF-32, UCS-2, UCS-4.**
- CDATA.
- Namespace prefixes. (They're included as part of the name.)***
- Comments.
- Identification of an element's namespace.***
- XML validation (i.e. it allows some malformed XML).
- (more? this wasn't a thorough analysis)
Up to you to decide if it fits your needs or not.
* — A post-processor could fix this if no entities were processed at all.
** — A pre-processor such as the following would fix this:
sub _predecode {
my $enc;
if ( $_[0] =~ /^\xEF\xBB\xBF/ ) { $enc = 'UTF-8'; }
elsif ( $_[0] =~ /^\xFF\xFE/ ) { $enc = 'UTF-16le'; }
elsif ( $_[0] =~ /^\xFE\xFF/ ) { $enc = 'UTF-16be'; }
elsif (substr($_[0], 0, 100) =~ /^[^>]* encoding="([^"]+)"/) { $en
+c = $1; }
else { $enc = 'UTF-8'; }
return decode($enc, $_[0], Encode::FB_CROAK | Encode::LEAVE_SRC);
}
*** — A post-processor could fix this, but one wasn't supplied.
Update: Added pre-processor I had previously coded.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.