in reply to ISO 8859-1 characters and \w \b etc.

This depends on the language, and apparently can conflict with unicode, but take a look at perldoc perllocale (sepecifically, the section on LC_CTYPE)

I haven't used locales at all, so I can't really help you any further than that.

Edit: fixed url.

  • Comment on Re: ISO 8859-1 characters and \w \b etc.

Replies are listed 'Best First'.
Re^2: ISO 8859-1 characters and \w \b etc.
by dpavlin (Friar) on Jun 27, 2004 at 22:20 UTC
    I know that this node is somewhat duplicate of things said above, but as a ISO-8859-2 user, I would like to emphasize that you will need to setup LC_CTYPE and use locale. You should also consider setuping LC_COLLATE so that sort also uses locale.

    You will have to have locale installed on your system. Try setting enviroment variables and running perl -v to see if perl picks up locale (it will complain if it doesn't).

    Having said that, locale setup is done per language and country (that's why locale for Croatia is hr_HR and for USA en_US). You might also use locale aliases (defined in /usr/lib/X11/locale/locale.alias).

    It might be enough just to add use locale; in your code. If you need. Example follows (for Croatia with it's funny accented characters; we use ISO-8859-2, but principle is the same).

    #!/usr/bin/perl -w use strict; use locale; use POSIX qw(locale_h); setlocale(LC_CTYPE, 'hr_HR'); setlocale(LC_COLLATE, 'hr_HR'); my $text = "foo čevapčić bar"; print join(", ",sort split(/\W/,$text)),"\n";
    If you are not bothered with changing system-wide locale, you can also setup your /etc/profile and apache's httpd.conf with enviroment variables and drop setlocale from code.
    2share!2flame...
Re^2: ISO 8859-1 characters and \w \b etc.
by Melroch (Acolyte) on Jun 27, 2004 at 17:57 UTC

    Thanks. Truth to say I have looked at perldoc perllocale several times and not got any wiser, I'm afraid.

    I guess what I'm really looking for is a plain English description of how to get and set locales. The workaround of using numerals instead of letters only gets you so far...

    /Melroch

      See the ENVIRONMENT secion in perllocale, and maybe your local manpage for "locale". You might have to install extra locales you want to use (my system only has the "C" and "POSIX" locales, apparently). Basically you can set a couple of environment variables, and that will determine the locale your perl program will run under. Which locales are supported is system-dependent, I can see mine using "locale -a".

      Hope this helps,
      Joost.