elef has asked for the wisdom of the Perl Monks concerning the following question:

I'm using DBD::SQLite to create multilingual FTS databases. I got the unicode61 tokenizer working, but I'm getting an error message when I add remove_diacritics=0 in order to stop the tokenizer from folding ú, ü and ű all into u and so on.
The documentation says: By default, "unicode61" also removes all diacritics from Latin script characters. This behaviour can be overridden by adding the tokenizer argument "remove_diacritics=0". For example: CREATE VIRTUAL TABLE txt3 USING fts4(tokenize=unicode61 "remove_diacritics=0");

Well, that's what I'm doing and I get an error:
DBD::SQLite::db do failed: unrecognized parameter: remove_diacritics=0

Relevant code:
$dbh->do("CREATE VIRTUAL TABLE tmdata USING fts4 ($collist, tokenize=u +nicode61)") # works but has undesired side effect of treating all acc +ented letters the same as the 'base letter' # $dbh->do("CREATE VIRTUAL TABLE tmdata USING fts4 ($collist, tokenize +=unicode61, \"remove_diacritics=0\")") # fails with unrecognized para +meter error # $dbh->do("CREATE VIRTUAL TABLE tmdata USING fts4 ($collist, tokenize +=unicode61, remove_diacritics=0)") # fails the same way return;}

I realize that this is probably a rarely used feature. If nobody has a solution I might email the maintainer (Kenichi Ishigaki).

Replies are listed 'Best First'.
Re: DBD::SQLite: remove_diacritics throws error
by Anonymous Monk on Jan 14, 2015 at 21:03 UTC
    CREATE VIRTUAL TABLE txt3 USING fts4(tokenize=unicode61 "remove_diacri +tics=0")
    There is no comma between unicode61 and "remove_diacritics".
      You're right, thanks. I can't believe I didn't spot that when I looked through the code 5 times...