Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

DBM modules and unicode keys

by ph0enix (Friar)
on Nov 22, 2002 at 13:09 UTC ( [id://215083]=perlquestion: print w/replies, xml ) Need Help??

ph0enix has asked for the wisdom of the Perl Monks concerning the following question:

I need to use tied hashes for save my RAM. Tried to use module MLDBM because uusing of nested data structures, but there is one problem. Used keys contains unicode characters. If I want to get list of preset keys and use @mykeys = keys %data; the returned key names probably somewhere loose their unicode flag. Here is samle code

#!/usr/bin/perl_parallel -w # For Emacs: -*- mode:cperl; mode:folding; coding:utf-8; -*- use strict; use utf8; use DB_File; use MLDBM qw (DB_File Storable); # ) use Fcntl; my $dbfile = 'database.utf'; my %data = (); tie ( %data, 'MLDBM', $dbfile, O_CREAT | O_RDWR, 0666, $DB_BTREE ) || +die $!; open DATA, '<:utf8', 'input.utf8' or die $!; while (<DATA>) { chomp; my ($key, $value) = split(':', $_, 2); $data{$key} = $value; } close DATA; print join(', ', keys %data), "\n"; exit 0;

I have idea to use use Encode; and replace keys %data with map { encode('utf8', $_) } keys %data, but is it the right way? or am I missed something? Any suggestions?

Replies are listed 'Best First'.
Re: DBM modules and unicode keys
by demerphq (Chancellor) on Nov 22, 2002 at 13:32 UTC
    I think the easiest way to resolve this would be to use DB_File's filter_fetch_key() and filter_store_key() mechanism. Use that to ensure that whatever gets put in gets pulled out the same way, even if it isnt stored internally in the DB the way one might think.

    --- demerphq
    my friends call me, usually because I'm late....

      I tried following code without success. Looks like return only octets instead of string...

      (tied %data)->filter_fetch_key( sub { decode('utf8', $_); } ); print join(', ', keys %data), "\n";

      But this code produces correct output

      print join(', ', map { decode('utf8', $_) } keys %data), "\n";

      Another hint?

        Well i would guess that you need to encode the data as you put it in the DB and then decode it when you take it out. The code you have posted suggests that you are trying to extract from a DB created without using an appropriate filter_store_key(). I would delete the db, construct the appropriate store and fetch filters and then try rebuilding it.

        Im sorry I cant help more than that, im not too familiar with the ins and outs of UTF8, but this is definately where I would start to try to solve the problem.

        --- demerphq
        my friends call me, usually because I'm late....

Re: DBM modules and unicode keys
by ph0enix (Friar) on Nov 23, 2002 at 09:03 UTC

    Code with suggested modifications seems does not work I expected. Another suggestion or advice?

    update: fixed code to be correctly working with unicode keys

    #!/usr/bin/perl_parallel -w # For Emacs: -*- mode:cperl; mode:folding; coding:utf-8; -*- use strict; use utf8; use DB_File; use MLDBM qw (DB_File Storable); # ) use Fcntl; my $dbfile = 'database.utf'; my %data = (); # remove old file to be sure we have only new data unlink $dbfile if -f $dbfile; tie ( %data, 'MLDBM', $dbfile, O_CREAT | O_RDWR, 0666, $DB_BTREE ) || +die $!; # try to handle unicode by transforming to octets and back (tied %data)->filter_store_key( sub { $_ = encode('utf8', $_); } ); (tied %data)->filter_fetch_key( sub { $_ = decode('utf8', $_); } ); open DATA, '<:utf8', 'input.utf8' or die $!; while (<DATA>) { chomp; my ($key, $value) = split(':', $_, 2); $data{$key} = $value; } close DATA; # now we can obtain keys as unicode strings and no only octets print join(', ', keys %data), "\n"; untie %data; exit 0;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://215083]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-18 19:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found