comment on

Hi,

I filed a bug regarding this specific issue, you can read it here: https://rt.cpan.org/Public/Bug/Display.html?id=109706

As noted within this ticket, I have a workaround for your specific problem.

This module contains an internally defined list of typelookup handlers for each supported primitive. It looks like this:

typelookup => {
  base64 => [10, sub {$_[0] =~ /[^\x09\x0a\x0d\x20-\x7f]/}, 'as_base64
+'],
  int => [20, sub {$_[0] =~ /^[+-]?\d+$/}, 'as_int'],
  double => [30, sub {$_[0] =~ /^(-?(?:\d+(?:\.\d*)?|\.\d+)|([+-]?)(?=
+\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?)$/}, 'as_double'],
  dateTime => [35, sub {$_[0] =~ /^\d{8}T\d\d:\d\d:\d\d$/}, 'as_dateTi
+me'],
  string => [40, sub {1}, 'as_string'],
},
[download]

You'll notice that base64 is first, and has a precedence value of 10. That means that any value seen will be duck typed using the specified comparison BEFORE any other comparisons are made.

You'll also notice the duck typing comparison here basically accepting "anything but ASCII". That is the reason your UTF8 strings are being base64 encoded, because they are anything but ASCII, and therefor meet this definition.

Fixing this is relatively trivial, fortunately.

What I ended up doing was overriding the initialize() method contained with XMLRPC::Lite. Within this override I invoke the original initialize() from XMLRPC::Lite, but then I stomp over the value of the base64 typehandler. Instead of looking for all non-ascii, I look for all non-ASCII that doesn't have that utf8 flag set.

The result ends up looking like this:

sub initialize {
  my $self = shift;

  my $config = {XMLRPC::Server::initialize(@_)};
  my $typelookup = $$config{serializer}->typelookup();

  # adjust the definition for base64 data, skip over any scalars with 
+the utf8 property set
  $typelookup->{base64} = [10, sub {$_[0] =~ /[^\x09\x0a\x0d\x20-\x7f]
+/ && !utf8::is_utf8($_[0])}, 'as_base64'];

  return %{$config};
};
[download]

Once implemented this seemingly works as expected, although I welcome any critiques that might prove otherwise.

Moving forward it appears that SOAP::Lite project is either dead, or moving very slowly. The above definition for base64 should probably me merged right into the project, as it's a substantial improvement over what's being distributed right now, but it appears there is nobody to do that...

In reply to Re: How to convince SOAP::Lite to return UTF-8 data in responses as UTF-8? by gaimrox
in thread How to convince SOAP::Lite to return UTF-8 data in responses as UTF-8? by mithaldu

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.