Selvakumar has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
I want to convert word document to xml and the important thing is i need to convert all special charcter to unicode characters.
Please suggest how can i achieve this?

Input: Sample data & and text for some <conversion> "text". output: Sample data &#x00026; and text for some &#x0003C;conversion&#x0003E; & +#x0201C;text&#x0201D;.

Replies are listed 'Best First'.
Re: Convert word special characters
by marto (Cardinal) on Mar 11, 2010 at 12:38 UTC
Re: Convert word special characters
by almut (Canon) on Mar 11, 2010 at 15:47 UTC

    In case it should turn out that Word can't encode the XML itself, you could use

    use HTML::Entities qw(encode_entities_numeric); # unicode/character string my $in = qq(Sample data & and text for some <conversion> \x{201C}text\ +x{201D}.); print encode_entities_numeric($in); __END__ Sample data &#x26; and text for some &#x3C;conversion&#x3E; &#x201C;te +xt&#x201D;.

    (that is, if you can get away without the (IMHO superfluous) leading zeros in the hex values...)

Re: Convert word special characters
by rovf (Priest) on Mar 11, 2010 at 14:48 UTC