ateague has asked for the wisdom of the Perl Monks concerning the following question:
UPDATE:
I went ahead and used the brute-force nuclear option and manually edited the declaration before passing it to XML::Twig.
open my $fh, '+<:utf8', 'file.in.xml' or die $!; my $line = <$fh>; $line=~s/<\?xml.+encoding="\Kutf-16"/utf-8" / or die "didn't match line: '$line'"; seek $fh,0,0 or die $!; print $fh $line; close $fh;
Thank you haukex for your help!
Good afternoon!
I am trying to use XML::Twig to strip out comments in a file provided by an upstream process. Unfortunately, the upstream process is incorrectly marking the encoding as "utf-16" when it is not. This causes XML::Twig (and XML::Parser) to fail with a encoding specified in XML declaration is incorrect at line 1, column 30, byte 30 error
Is there an option in XML::Twig that can be set to "relax" the parsing to ignore the incorrect encoding specified in the declaration?
Thank you for your time.
Perl info:
XML::Twig info: 3.52perl -v This is perl 5, version 24, subversion 0 (v5.24.0) built for MSWin32-x +64-multi-thread
#!/usr/bin/perl use 5.024; use strict; use warnings; use XML::Twig; open (my $OFILE, '>:utf8', 'file.out.xml') or die "$!\n$^E"; my $t = XML::Twig->new( twig_handlers => { '/keys/key' => sub { $_[0]->flush($OFILE); }, }, output_encoding => 'utf-8', pretty_print => 'indented', comments => 'drop', # remove any comments ); $t->safe_parse(\*DATA); if ( $@ ) { die "Error occured in XML data\n\n$@"; } close $OFILE; __DATA__ <?xml version="1.0" encoding="utf-16"?> <keys> <!-- One hen --> <key>45646fa8-32e5-494c-93ff-0f00281fc2d6</key> <!-- Two ducks --> <key>b6bdc46f-3275-4312-bbbd-3e375208d05f</key> <!-- Three squawking geese --> <key>e5a37cf0-1f69-41a8-899c-23454600894a</key> <!-- Four limerick oysters --> <key>b6287f3d-f70c-498d-8360-5a2d8e863ab3</key> <!-- Five corpulent porpoises --> <key>118be380-5e69-47d4-81c6-756c34334936</key> <!-- Six pair of Don Alverzo's tweezers --> <key>46f9dd5b-d0e9-4f8f-a559-f698bea561fa</key> <!-- Seven thousand Macedonians in full battle array --> <key>9627058f-29f0-4263-8978-fc77ac2fe0a3</key> <!-- Eight brass monkeys from the ancient sacred crypts of Egypt --> <key>6038d393-ba81-423e-8429-01406779ff9e</key> <!-- Nine apathetic, sympathetic, diabetic old men on roller skates, with a marked propensity towards procrastination and sloth --> <key>5a67c3f0-ea6f-427c-bc3a-86fdb31fd117</key> <!-- Ten lyrical, spherical, diabolical denizens of the deep who stalk about the corners of the cove all at the same time. --> <key>7ac8b1d8-ff60-4b55-8fe0-ea809d9f5b02</key> </keys>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML::Twig - Parsing XML file with incorrect encoding in declaration
by holli (Abbot) on Sep 02, 2017 at 06:44 UTC | |
|
Re: XML::Twig - Parsing XML file with incorrect encoding in declaration
by haukex (Archbishop) on Sep 02, 2017 at 08:38 UTC | |
by ateague (Monk) on Sep 18, 2017 at 13:49 UTC | |
|
Re: XML::Twig - Parsing XML file with incorrect encoding in declaration
by NetWallah (Canon) on Sep 01, 2017 at 23:30 UTC | |
by ateague (Monk) on Sep 01, 2017 at 23:47 UTC | |
by NetWallah (Canon) on Sep 02, 2017 at 01:07 UTC | |
by ateague (Monk) on Sep 18, 2017 at 13:42 UTC |