fanticla has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have a simple SQLite database. I read the content of the table and print it out (in a TK table). When printing it out, I need to sort the entries in an alphabetical order. Easy for English, but I have a small set of non-ASCII characters, such as german ä/ü/ö. I came up with:

order by column collate NOCASE

Result (orderd): a g x ä
Right order: a ä g x

My script:

$dbh = DBI->connect( "dbi:SQLite:files/database/data.db" ) || die "Can +not connect: $DBI::errstr"; $selected = $dbh->selectall_arrayref("SELECT ID,column1,column2 FROM +table order by column1 collate NOCASE"); foreach my $row_db (@$selected ) { my ($ID, $column1, $column2) = @$row_db; print "$column1\n"; }

One day of manuals and goole gave me no clue! (Mea culpa)!

I am not familiar with creating new collate in perl. Unfortunatelly I didn't find any example out there. I thought even about the posibility to substitute ä->a before performing SELECT. Or... maybe doing the sorting in perl at the stage of 'foreach' (@$selected), maybe not the best solution, but probably the easiest one.

Can any of you suggest a - possible easy - way to solve the problem?

THANXS Cla

Replies are listed 'Best First'.
Re: SQLite (DBD) Sorting
by choroba (Cardinal) on Apr 29, 2010 at 15:41 UTC
    That's why SQLite is just lite :) But googling for locale sqlite I found several answers to your question (e.g., this one uses ICU).
Re: SQLite (DBD) Sorting
by thundergnat (Deacon) on Apr 29, 2010 at 17:00 UTC

    You could sort using a routine to decompose the accented letter to their base characters. Something like this perhaps.

    sub deaccent { my $phrase = shift; return $phrase unless ( $phrase =~ y/\xC0-\xFF// ); #short circuit + if no upper chars # translterate what we can (for speed) $phrase =~ tr/ÀÁÂÃÄÅàáâãäåÇçÈÉÊËèéêëÌÍÎÏìíîïÒÓÔÕÖØòóôõöøÑñÙÚÛÜùúûü +Ýÿý/AAAAAAaaaaaaCcEEEEeeeeIIIIiiiiOOOOOOooooooNnUUUUuuuuYyy/; # and substitute the rest my %trans = qw(Æ AE æ ae Þ TH þ th Ð TH ð th ß ss); $phrase =~ s/([ÆæÞþÐðß])/$trans{$1}/g; return $phrase; }

    used as:

    foreach my $row_db ( sort { deaccent($a) cmp deaccent($b) } @$selected + ) {
    foreach my $row_db ( sort { deaccent($a->[1]) cmp deaccent($b->[1]) } +@$selected ) { my ($ID, $column1, $column2) = @$row_db; print "$column1\n"; }

    Or as a Schwartzian Transform. (probably not worth it unless @$selected is large)

    foreach my $row_db (map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { + [$_, deaccent($_)] } @$selected ) {
    foreach my $row_db (map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { + [$_, deaccent($_->[1])] } @$selected ) { my ($ID, $column1, $column2) = @$row_db; print "$column1\n"; }

    Note: this only covers ISO-Latin-1, if you need other code pages, modify deaccent() to suit (or possibly use Text::Unidecode)

    UPDATE: fixed errors.

      Thank you for your reply. What I don't get is how you are going to limit the sorting to only one specific database column (in the example column1) when using your first solution:

      foreach my $row_db ( sort { deaccent($a) cmp deaccent($b) } @$selected + ) { my ($ID, $column1, $column2) = @$row_db; print "$column1\n"; }

      I implemented your approach, but I get a "no ordered" output, probably due to the missing reference to the column I want to sort.

      I'd really hope to get your answer!

        sort { deaccent($a) cmp deaccent($b) } @$selected
        should be
        sort { deaccent($a->[1]) cmp deaccent($b->[1]) } @$selected
        and
        map $_->[0], sort $a->[1] cmp $b->[1], map [ $_, deaccent($_) ], @$selected
        should be
        map $_->[0], sort $a->[1] cmp $b->[1], map [ $_, deaccent($_->[1]) ], @$selected

        Ah. My bad. You need to access the correct column in the row array.

        foreach my $row_db ( sort { deaccent($a->[1]) cmp deaccent($b->[1]) } +@$selected ) { my ($ID, $column1, $column2) = @$row_db; print "$column1\n"; }