in reply to Re^2: parsing XML fragments (xml log files) with... a regex
in thread parsing XML fragments (xml log files) with XML::Parser
For those interested, it can't handle
Up to you to decide if it fits your needs or not.
* — A post-processor could fix this if no entities were processed at all.
** — A pre-processor such as the following would fix this:
sub _predecode { my $enc; if ( $_[0] =~ /^\xEF\xBB\xBF/ ) { $enc = 'UTF-8'; } elsif ( $_[0] =~ /^\xFF\xFE/ ) { $enc = 'UTF-16le'; } elsif ( $_[0] =~ /^\xFE\xFF/ ) { $enc = 'UTF-16be'; } elsif (substr($_[0], 0, 100) =~ /^[^>]* encoding="([^"]+)"/) { $en +c = $1; } else { $enc = 'UTF-8'; } return decode($enc, $_[0], Encode::FB_CROAK | Encode::LEAVE_SRC); }
*** — A post-processor could fix this, but one wasn't supplied.
Update: Added pre-processor I had previously coded.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: parsing XML fragments (xml log files) with... a regex
by tye (Sage) on Mar 18, 2011 at 18:11 UTC | |
by ikegami (Patriarch) on Mar 18, 2011 at 19:37 UTC | |
by tye (Sage) on Mar 18, 2011 at 22:10 UTC | |
by ikegami (Patriarch) on Mar 18, 2011 at 23:00 UTC | |
by ikegami (Patriarch) on Mar 18, 2011 at 22:57 UTC |