BerntB has asked for the wisdom of the Perl Monks concerning the following question:

I want to sort data in a PostgreSQL table in the same way as in large Perl data structures. (To cache large data structures transparently in the db)

I read about i18n settings (initdb for P-sql, etc) and it looked easy. show all; in psql shows lc_collate eq en_US.UTF-8.

It doesn't even work for 7-bit ASCII strings! PostgreSQL ignores case and spaces (in strings -- not only postfix?!). Spaces inside strings are needed in some cases, so I can't just repack everything.

I saw that the "C" locale should sort on spaces, but then it won't sort i18n char sets correctly.

I use the standard Ubuntu P-SQL and Perl for this simple test.

A common sorting problem like this should either be very simple or discussed in lots of places. Since I can't seem to find those discussions, I'm missing something obvious? :-)

Update: Fixed language after I slept. :-)

Update 2: Ported to MySQL 5 and Collation works well, even supports case sensitivity (if I want it). My tests also run noticeably faster (with InnoDB). PostgreSQL seems behind MySQL in sql standard support in this area.

Replies are listed 'Best First'.
Re: Postgresql and space characters
by cdarke (Prior) on Sep 17, 2006 at 09:03 UTC
    Perl ignores locales unless you use locale; see perllocale
      Thanks. Still same as in the Camel book but more collating detail, right? (I never got so far as to need this, sigh.)
Re: Postgresql and space characters
by gellyfish (Monsignor) on Sep 17, 2006 at 08:51 UTC
      Thanks, but I already rtfm half a dozen times, for different p-gsql versions. (I even skimmed the relevant parts of the sql-99 standard and searched the relevant mailing lists.) Embarassingly, I still don't understand what to do.

      Yes, it isn't really a Perl question, except that lots of/most people use db:s with Perl. Everyone else can sort on national character sets with sorting on spaces inside char strings? I must be doing some stupid mistake.

      Too much grief; I'll check the MySQL docs re collation (I verified all sql features with the MySQL docs when writing the code).

Re: Postgresql and space characters
by graff (Chancellor) on Sep 17, 2006 at 22:14 UTC
    I use the standard Ubuntu P-SQL and Perl for this simple test.

    Um, what test? You didn't show us any test -- no sample input data, no output, no code, and no clear explanation or example of what you mean when you say "it doesn't work".

    Maybe you are missing something obvious, but you haven't given us anything to work with yet. As indicated above, it's not even clear that this is a perl question.

    Update: You said:

    I want to sort data in a PostgreSQL table in the same way as in large Perl data structures. (To cache large data structures transparently in the db)

    But, there are lots of ways of sorting large perl data structures -- and lots of different kinds of structures that involve different methods for implementing various ways of sorting -- and there may even be ways to "cache large data structures transparently in the db" that do not require any kind of sorting.

    Your posts in this thread so far have all been very vague and unclear. If this really is (or could be) a perl question, give us a better idea of what you are talking about. (Maybe you just need to add an extra column to your table(s), to specify the desired ordering for the data records.)

      Um, what test?
      I was discussing the code I wrote. Yes, yes.. you can rightly complain about my language abilities. I am not even that much better in my native tounge. Here is an sql example of my problem:
      create table s ( a varchar(255) ); # ... div inserts ... select a from s order by a; a ---------- fooabba foo bar foo gaa foo gznk fool foonoo foo zazza
      Spaces are ignored. That is not possible for my data. (I can live with case not being significant.)
      But, there are lots of ways of sorting large perl data structures
      I want to use the one built into Perl with maximum speed and support for internationalization. My belief was that Perl's "native" sort would work the same as PostgreSQL, since both seemed to more or less use the native OS implem. Sorry if I was unclear.
      Maybe you just need to add an extra column to your table(s), to specify the desired ordering for the data records.
      That is in fact the data I am trying to sort. At an absolute minimum, I need a char that I can replace space with -- which will be less than all alphanumeric chars in P-SQL.

      MySQL has better sql-support than PostgreSQL(!) here, with different collations on columns, etc. I'll look into that. Pity, I had hoped to support both databases when releasing the code (-: if I ever get so far :-).

        I want to use the one built into Perl with maximum speed and support for internationalization... At an absolute minimum, I need a char that I can replace space with -- which will be less than all alphanumeric chars in P-SQL.

        Now that you've shown an example, it does seem very strange that P-SQL's "order by" behaves like that. If you are using a perl script for both storing to and fetching from P-SQL, you might try replacing spaces with a control character -- e.g. do  tr/ /\x1a/ on the strings before loading the data into the table, and then  tr/\x1a/ / on the strings after you pull them back out from the table.

        But I don't have a P-SQL system available to test this on... maybe it would ignore control characters the same way it ignores spaces. If so, you have to figure out whether you can use something like "/" or "*" or "#" or "%" as a replacement for spaces -- whatever will "stick" during the P-SQL sorting, but is not already present in the original data.

        Is there too much of this data to be held in memory at one time? If the perl script that loads the table has all the data in one array or whatever, you could create your table like this:

        create table s ( a varchar(255), o int ); # insert the string and its numeric sort order as set by perl's sort select a from s order by o;