Wayne has asked for the wisdom of the Perl Monks concerning the following question:
Hello,
I am attempting to parse valid XML data as a string and output the result as a complex data structure (a hash of hashes and arrays) which I can then use to reference individual data elements.
I have written a small function to perform this task, which almost works. It runs without error and it does return a hash that I can reference.
The problem is that a few of the hash keys have been truncated. That is, they are missing the last character. The majority of keys, however appear fine.
I am using valid XML and the problem appears to be intermittent, although the fields affected are always the same (company_name, fee_lines, address_2, etc.)
Here is the code i'm using:
#!/usr/bin/perl -w use strict; use 5.24.1; use Data::Dumper qw(Dumper); use XML::Mini::Document; $Data::Dumper::Terse=1; my($xml)='<insert valid xml string here>'; my($hash_ref)=&convert_xml_to_hash($xml); my $data=eval($hash_ref); say Dumper($data); exit; sub convert_xml_to_hash { my($xml)=(@_); my($hash)={}; if($xml) { my($xml_object)=XML::Mini::Document->new(); $xml_object->parse($xml); my($output)=$xml_object->toHash(); $hash=Dumper($output); } return($hash); }
Here is a sample of the output:
... 'billing_address' => { 'province' => 'New Brunswick', 'city' => 'Moncton', 'company_nam' => '', # Should be company_name 'phone' => '5065551212', 'address_' => '', # Should be address_2 'country_code' => 'CA', 'first_name' => 'Joe', 'address_1' => '123 Somestreet', 'email' => 'someone@somewhere.com', 'country' => 'Canada', 'postal_code' => 'E4E 4E4', 'last_name' => 'Blow', 'province_code' => 'NB' }, 'fee_line' => '', # should be fee_lines 'shipping_tax' => '0.00', ...
At first I thought that they were just some careless typos, but after some investigation I realized that was not the case.
Any insight into this problem or a nudge in the right direction would be greatly appreciated.
Thanks!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML to Hash Truncating Keys Problem
by roboticus (Chancellor) on Mar 28, 2019 at 14:45 UTC | |
It would've been easier had you actually included a bit of XML. I went ahead and installed XML::Mini::Document and ran it, but as you can see, I'm not experiencing the problem:
Perhaps you have either (a) some wonky data, or (b) a bit of code you've not shown us is frobnicating it somewhere. Edit: I forgot to mention that I didn't build the input by hand. Since you at least provided some output, I edited the output to make it what you wanted. I then used XML::Mini::Document to rebuild the input (code in the readmore tags below). Read more... (1422 Bytes) | [reply] [d/l] [select] |
|
Re: XML to Hash Truncating Keys Problem
by Lotus1 (Vicar) on Mar 28, 2019 at 16:13 UTC | |
It worked for me after I removed the '%' character from my test xml. With that character included in a text node as shown below the parse function would hang. Perhaps you have some other special character that the XML::Mini::Document module can't handle.
Here is the output: Read more... (4 kB) | [reply] [d/l] [select] |
|
Re: XML to Hash Truncating Keys Problem
by tangent (Parson) on Mar 29, 2019 at 00:21 UTC | |
It requires that you know a good bit about your incoming data structure but I think the results will be quite useful for further processing. OUTPUT:
| [reply] [d/l] [select] |
|
Re: XML to Hash Truncating Keys Problem
by hdb (Monsignor) on Mar 28, 2019 at 15:22 UTC | |
I am a bit confused by your logic here. First you apply the toHash function which returns a hash reference. But then you use Dumper to stringify it, this string is the output of your convert_xml_to_hash function. Upon return from your function, you eval it, which gives you (hopefully) a hash reference again. Then you use Dumper again to print it. Have you tried to just use the output of the toHash function directly? With so much back and forth between hash and string, it is no wonder that Perl loses a few characters... | [reply] [d/l] [select] |
by haukex (Archbishop) on Mar 28, 2019 at 19:52 UTC | |
With so much back and forth between hash and string, it is no wonder that Perl loses a few characters... I agree with all of your post except for the last sentence, I disagree with it. Data::Dumper wouldn't just lose characters like that.... | [reply] |
|
Re: XML to Hash Truncating Keys Problem
by hdb (Monsignor) on Mar 28, 2019 at 15:24 UTC | |
While the usage of XML::Simple is discouraged even by its author, it should do exactly what you want here. | [reply] [d/l] |
by Jenda (Abbot) on Apr 02, 2019 at 22:26 UTC | |
It would. The catch is that the data structure would be inconsistent and fragile. See Simpler than XML::Simple. Jenda | [reply] |
|
Re: XML to Hash Truncating Keys Problem
by Wayne (Novice) on Mar 29, 2019 at 13:20 UTC | |
Update: The solution that I have found is much simpler and elegant and doesn't appear to produce any errors in the output. It uses the XML::Fast library:
Cheers! | [reply] [d/l] |
|
Re: XML to Hash Truncating Keys Problem
by Wayne (Novice) on Mar 28, 2019 at 21:01 UTC | |
Hello, and thank you all for the many prompt replies! :) Roboticus- "Perhaps you have either (a) some wonky data, or (b) a bit of code you've not shown us is frobnicating it somewhere." Yes! Perhaps both! I have a PHP function that generates XML data from a Woocommerce(Wordpress) database. I validated the output using several online XML validation tools and they all reported valid XML output. Thus I assumed that the PHP function was working properly. hdb- "Have you tried to just use the output of the toHash function directly? With so much back and forth between hash and string, it is no wonder that Perl loses a few characters..." No not yet, but thanks for the suggestion! "While the usage of XML::Simple is discouraged even by its author, it should do exactly what you want here." I initially wrote it using XML::Simple but upon discovering that it wasn't recommended anymore I decided to look for a more robust alternative. "I am a bit confused by your logic here....Then you use Dumper again to print it." Sometimes my logic can be a little fuzzy but no worries - it's as clear as mud.... And Using Dumper to print again was just for debugging. There are actually container variables storing hash values but they weren't relevant to the problem so I didn't include them. Haukex - "Data::Dumper wouldn't just lose characters like that...." I agree. I believe that the problem most likely exists between the keyboard and the chair (yours truly!) and not the module itself. Anonymous- "valid XML data - show it please" Yes, I should have posted the XML in the OP. Since it was validating elsewhere and was long and ugly I didn't bother. I've posted it below. The only changes are to some PII - IP address, domain name and email address. I also posted the output, below the XML. Thanks again!
| [reply] [d/l] [select] |
by roboticus (Chancellor) on Mar 28, 2019 at 21:39 UTC | |
It looks like it may be a problem in XML::Mini::Document. When I changed one of my items in my XML document from <address_2></address_2> to <address_2/> as shown in yours, it promptly lost the "2" from the end of the name. Edit: I had some slashes in the wrong places. ...roboticus When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] [select] |
by tangent (Parson) on Mar 28, 2019 at 22:00 UTC | |
https://rt.cpan.org/Public/Bug/Display.html?id=50171 Fri Oct 02 14:07:24 2009 | [reply] |
by Wayne (Novice) on Mar 29, 2019 at 11:55 UTC | |
by Anonymous Monk on Mar 28, 2019 at 23:10 UTC | |
|
Re: XML to Hash Truncating Keys Problem
by Anonymous Monk on Mar 28, 2019 at 14:39 UTC | |
| [reply] |
|
Re: XML to Hash Truncating Keys Problem
by Wayne (Novice) on Mar 28, 2019 at 14:23 UTC | |
Correction - The problem is not intermittent. It happens everytime. | [reply] |