fanticla has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I changed my script to handle UTF characters (I just used ANSI) and to query a sqlite database. Now the following query does not properly work anymore:

$all = $dbh->selectall_arrayref("SELECT ID,col_1, col_2 FROM table WHE +RE LOWER(col_1) LIKE '$searching_string%'");

By characters in $searching_string such as äüö, the query does not match anymore.

If I remove LOWER, the query matches (lower matches lower, upper case matches upper case), but it does not match äÄ üÜ öÖ

What I am doing wrong? Any suggestions?

Thanks Cla

Update: I just noted that the query works perfectly (lower upper case) even without LOWER. This is true for Western europe characters (!!). The problem is still there for äÄ üÜ öÖ etc.

Replies are listed 'Best First'.
Re: sqlite utf lower
by ikegami (Patriarch) on Jun 07, 2010 at 21:26 UTC
    For starters, you should be using a placeholder.
    my $all = $dbh->selectall_arrayref( "SELECT ID, col_1, col_2 FROM table WHERE LOWER(col_1) LIKE ?", undef, "$searching_string%", );

    Secondly, did you make sure that LOWER() works with non-ascii characters? The documentation for the function reads

    "The lower(X) function returns a copy of string X with all ASCII characters converted to lower case. The default built-in lower() function works for ASCII characters only. To do case conversions on non-ASCII characters, load the ICU extension."

    Did you load the ICU extension?


    Finally, you provided very little information as to what $searching_string contains. Is it decoded text or is encoded text? If it's encoded, which encoding is used? In case there's a bug in DBD::SQLite, it would also be useful to know which how is the string stored internally, UTF8=0 or UTF8=1? A good starting point would be to give the output of

    use Devel::Peek; Dump($searching_string);

      Thank you for your suggestions. Something about $searching_string

      my $searching_string=Encode::decode_utf8( $searching_string_widget );# +$searching_string_widget is variable in Entry widget (tk)

      Output from

      use Devel::Peek; Dump($searching_string);

      SV = NULL(0x0) at 0x3b51f8c
      REFCNT =
      FLAGS = ()
      PV = 0

      Yes, I confirm that LOWER works with non ASCII characters. Strangly enought, the above script produces what I want (lower/upper case matches) even without LOWER (non ASCII). It just doesn't work with non western charachters (UTF8).

      I did'n load the ICU extension. I must admit I have no idea how it goes. I'll try to get some infos about it.