Hello all, and thanks for your thoughts. Perhaps I should explain differently ...

I am GETting xml docs from an IBM tool using LWP and using LibXML to parse them. This keeps failing due to unparsable characters such as e acute (x'e9') so I need to substitute those characters with parsable ones. My idea was to GET the xml doc then call a subroutine to replace x'e9' with x'65', x'a0' with x'20' and so on before parsing the doc with LibXML.

The subroutine would write to a temp file then delete the original and rename the temp file. The subroutine would call another whose job it is to replace in a string all instances of one hex value with another.

So, another way to describe my problem is that I have not been able to write a subroutine that accepts a string, a 'from' hex value and a 'to' hex value and returns a modified string.

The xml snip I showed as test data is real data snipped from an xml doc retrieved from the tool, and the two unparsable chars I've encountered so far are x'a0' and x'e9' (just e9 in the snip)... there are likely to be others so a generalised 'replacer' seems a good way to go.

What seemed like a straightforward thing to do has proven otherwise, hence asking the question here - I apologise if what I'm trying to achieve wasn't sufficiently clear. Any hep with what ought to be a simple subroutine will be warmly welcomed.


In reply to Re^2: Hex-matching Regex pattern in scalar by CliffG
in thread Hex-matching Regex pattern in scalar by CliffG

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.