Hello all, and thanks for your thoughts. Perhaps I should explain differently ...
I am GETting xml docs from an IBM tool using LWP and using LibXML to parse them. This keeps failing due to unparsable characters such as e acute (x'e9') so I need to substitute those characters with parsable ones. My idea was to GET the xml doc then call a subroutine to replace x'e9' with x'65', x'a0' with x'20' and so on before parsing the doc with LibXML.
The subroutine would write to a temp file then delete the original and rename the temp file. The subroutine would call another whose job it is to replace in a string all instances of one hex value with another.
So, another way to describe my problem is that I have not been able to write a subroutine that accepts a string, a 'from' hex value and a 'to' hex value and returns a modified string.
The xml snip I showed as test data is real data snipped from an xml doc retrieved from the tool, and the two unparsable chars I've encountered so far are x'a0' and x'e9' (just e9 in the snip)... there are likely to be others so a generalised 'replacer' seems a good way to go.
What seemed like a straightforward thing to do has proven otherwise, hence asking the question here - I apologise if what I'm trying to achieve wasn't sufficiently clear. Any hep with what ought to be a simple subroutine will be warmly welcomed. |