in reply to utf8 and GDBM
If GDBM expects bytes, you'll need to serialise your text into bytes. That specific type of serialisation is called encoding. The following encodes the text using UTF-8.
Based on what Khen1950fx posted,
#!/usr/bin/env perl use v5.8.5; # Why? use strict; use warnings; use utf8; use open ':std', ':utf8'; use English; use GDBM_File qw( GDBM_WRCREAT ); sub _e { my $s = shift; utf8::encode($s); $s } my $file = "FAST.gdbm"; tie (my %keys, GDBM_File, $file, GDBM_WRCREAT, 0644) or die ("Could not open \"$file\": $!\n"); my $count = 0; while (my $line = <STDIN>) { chomp $line; my ($key, $heading) = split (/\t/, $line); if (!eval { keys{_e($key)} = _e($heading); 1 }) { warn "Can't store record \"$line\" $@"; next; } if (0 == (++$count % 10000)) { print "$count loaded\n"; } } print "$count loaded\n";
|
|---|