If GDBM expects bytes, you'll need to serialise your text into bytes. That specific type of serialisation is called encoding. The following encodes the text using UTF-8.
Based on what Khen1950fx posted,
#!/usr/bin/env perl use v5.8.5; # Why? use strict; use warnings; use utf8; use open ':std', ':utf8'; use English; use GDBM_File qw( GDBM_WRCREAT ); sub _e { my $s = shift; utf8::encode($s); $s } my $file = "FAST.gdbm"; tie (my %keys, GDBM_File, $file, GDBM_WRCREAT, 0644) or die ("Could not open \"$file\": $!\n"); my $count = 0; while (my $line = <STDIN>) { chomp $line; my ($key, $heading) = split (/\t/, $line); if (!eval { keys{_e($key)} = _e($heading); 1 }) { warn "Can't store record \"$line\" $@"; next; } if (0 == (++$count % 10000)) { print "$count loaded\n"; } } print "$count loaded\n";
In reply to Re: utf8 and GDBM
by ikegami
in thread utf8 and GDBM
by musterion
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |