All the above scripts are NOT properly converting the child "Emphasis"
That's not quite true because as far as the parser is concerned the "Emphasis" is a valid XML tag. You will have to do a bit of manual labour to achieve your desired output.

I couldn't find a way to get the inner content of a node without getting the node's tags as well, so needed to use a regular expression to remove them. Hopefully this will get you on your way:

use Data::Dumper; use XML::LibXML; my $xml = q| <Publisher> <UniqueDOI>978-3-642-123456</UniqueDOI> <ChapterInfo ChapterType="OriginalPaper"> <Title Language="En">Is Light Blue (<Emphasis Type="Italic">az +zurro</Emphasis>) Color Name Universal in the Italian Language?</Titl +e> </ChapterInfo> </Publisher> |; my $doc = XML::LibXML->load_xml(string => $xml); my @Publishers = $doc->findnodes('//Publisher'); for my $Publisher ( @Publishers ) { my ($ChapterInfo) = $Publisher->findnodes('ChapterInfo'); my ($Title) = $ChapterInfo->findnodes('Title'); # get the Title node as literal XML my $content = $Title->toString(); print "Title content:\n$content\n"; # remove first and last XML tags $content =~ s/^<[^>]*>(.*)<[^>]*>$/$1/; # construct the hash reference my $hash = { UniqueDOI => $Publisher->findvalue('UniqueDOI'), ChapterInfo => { ChapterType => $ChapterInfo->getAttribute('ChapterType'), Title => { Language => $Title->getAttribute('Language'), content => $content, }, }, }; print Dumper($hash); }
See XML::LibXML::Node for explanation of these methods.

Output:

Title content: <Title Language="En">Is Light Blue (<Emphasis Type="Italic">azzurro</E +mphasis>) Color Name Universal in the Italian Language?</Title> $VAR1 = { 'UniqueDOI' => '978-3-642-123456', 'ChapterInfo' => { 'ChapterType' => 'OriginalPaper' 'Title' => { 'Language' => 'En', 'content' => 'Is Light Blue (<Emphasis Type="Italic">azzur +ro</Emphasis>) Color Name Universal in the Italian Language?' }, }, };

In reply to Re: XML to HashRef and then to JSON by tangent
in thread XML to HashRef and then to JSON by dominic01

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.