Hi PerlMonks

I have the following bilingual file and would like to extract the source and target nodes preserving its xml elements they might have as well as the line breaks. Could you please help me on that? I have no experience in xml parsing with Perl.

Here is a sample of my file:

<trans-unit id="1" maxbytes="14"> <source xml:lang="en-US">Hello <x id=1/> world! How are you?</source> <target xml:lang="ja-JP">Ciao<x id=1/> mondo! Come stai?</target> </trans-unit>

The expected result should be:

Hello <x id=1/> world! <lb/> How are you? || Ciao<x id=1/> mondo! <lb/> Come stai?

Thank you for your time!


In reply to Parsing XML file and keeping the formatting tags by corfuitl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.