in reply to postgres reg expression quoting (again)

Just like you were treating plain text as a regexp pattern, you are treating binary data (encoded text) as text. Decode it first. You can use utf8::decode for UTF-8 or the more general Encode::decode.

utf8
Encode

Update: For example,

use Data::Dumper; $Data::Dumper::Useqq = 1; $Data::Dumper::Terse = 1; $Data::Dumper::Indent = 0; my $s = "\x{E2}\x{99}\x{A0}"; # These are assumed to be print(Dumper(quotemeta($s)), "\n"); # iso-latin-1 by Perl utf8::decode($s); # ...until you decode them print(Dumper(quotemeta($s)), "\n");
"\\\342\\\231\\\240" "\x{2660}"

Replies are listed 'Best First'.
Re^2: postgres reg expression quoting (again)
by zdzieblo (Acolyte) on Apr 06, 2009 at 23:45 UTC

    right, in that case it looks like the best bet is to use Encode::decode_utf8() over utf8::decode() as it will also put the 0xFFFD in place of the malformed characters eliminating peril of being fed with random data.

    thanks!