Re: postgres reg expression quoting (again)

Just like you were treating plain text as a regexp pattern, you are treating binary data (encoded text) as text. Decode it first. You can use utf8::decode for UTF-8 or the more general Encode::decode.

utf8
Encode

Update: For example,

use Data::Dumper;

$Data::Dumper::Useqq = 1;
$Data::Dumper::Terse = 1;
$Data::Dumper::Indent = 0;

my $s = "\x{E2}\x{99}\x{A0}";        # These are assumed to be
print(Dumper(quotemeta($s)), "\n");  # iso-latin-1 by Perl

utf8::decode($s);                    # ...until you decode them
print(Dumper(quotemeta($s)), "\n");
[download]

"\\\342\\\231\\\240"
"\x{2660}"
[download]

Comment on Re: postgres reg expression quoting (again) Select or Download Code

Replies are listed 'Best First'.
Re^2: postgres reg expression quoting (again) by zdzieblo (Acolyte) on Apr 06, 2009 at 23:45 UTC
right, in that case it looks like the best bet is to use Encode::decode_utf8() over ~~utf8::decode()~~ as it will also put the 0xFFFD in place of the malformed characters eliminating peril of being fed with random data. thanks!	[reply]