I'am facing a problem with storing UTF8 keys via Perl's BerkeleyDB. On the command line the code works as expected:
The store filter gets a utf8 string and converts it into a byte string. The fetch filter gets a byte string and internalizes it.$ perl -Mstrict -Mutf8 -MBerkeleyDB -MEncode -MData::Dumper -le ' unlink "xx.db"; tie my %h, "BerkeleyDB::Btree", -Filename=>"xx.db", -Flags=>DB_CREAT +E; my $db=tied %h; $Data::Dumper::Useqq=1; $db->filter_fetch_key( sub { warn ">>fetch: ".Dumper($_); $_=decode("utf8", $_); warn "<<fetch: ".Dumper($_); }); $db->filter_store_key( sub { warn ">>store: ".Dumper($_); $_=encode("utf8", $_); warn "<<store: ".Dumper($_); }); $h{"ä"}=1; my @l=keys %h' >>store: $VAR1 = "\x{e4}"; <<store: $VAR1 = "\303\244"; >>fetch: $VAR1 = "\303\244"; <<fetch: $VAR1 = "\x{e4}";
When I use the same filters in my program I get this output:
While filling the database (2 keys are stored "äü" and "ää"):
But reading back fails:>>store_key: $VAR1 = "\x{e4}\x{fc}"; <<store_key: $VAR1 = "\303\244\303\274"; >>store_key: $VAR1 = "\x{e4}\x{e4}"; <<store_key: $VAR1 = "\303\244\303\244";
The perl snippet that produces this output looks like:>>store_key: $VAR1 = "\x{e4}"; <<store_key: $VAR1 = "\303\244"; >>fetch_key: $VAR1 = "\x{e4}\x{e4}"; <<fetch_key: $VAR1 = "\x{fffd}\x{fffd}";
$prefix is initially "ä". So the store filter called from the first c_get sees this utf8 string and converts it correctly into a byte string.my $check=qr/\A\Q$prefix\E(.)?/; $k=$prefix; if( ($rc=$cursor->c_get($k, $v, DB_SET_RANGE))==0 and $k=~$check ) { do { if( defined $1 ) { ... } } while( ($rc=$cursor->c_get($k, $v, DB_NEXT))==0 and $k=~$check ) +; }
Then the fetch filter should be passed the byte string "\303\244\303\244" but it gets the utf8 string "\x{e4}\x{e4}".
So, what is wrong here?
Why do I read a byte string from the database in one case (command line) and a character string in the other?
Thanks,
Torsten
In reply to BerkeleyDB + UTF8 by tfoertsch
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |