in reply to Encodings problem

I don't understand why you find this to be such an annoyance. Do you have the ability to tell Windows which character encoding it should use when storing file names in directories? (That isn't a Perl question, and I'm not a Windows user, so I don't know.)

If you can make your Windows system use utf8 for the file names, do that, so that the character encoding of the file names matches the character encoding of your web/cgi data.

If Windows will only use iso-8859 for file names in Greek, then your choices are limited to:

  1. Do all your web/cgi data in iso-8859, to match the encoding used in file names, or else
  2. Keep the web/cgi content in utf8, and just transliterate file name strings from one encoding to the other when you have to.
It's really not that big a deal either way, but personally, if I had a lot of web content in utf8 already, and I couldn't get windows to use utf8 for file names, then I think using Encode like you're doing now would be a lot cheaper, easier and quicker than changing all the web content.

You could just set up a module of your own that implements "utf8-to-iso" versions "open", "opendir", "readir" and maybe "glob" -- you could give them names like "gr_open" or whatever, and your cgi scripts then just need to use that module and call those functions instead of the "standard" ones.

Each function in the module would handle the encoding conversions internally, taking utf8 strings as args and giving back utf8 strings as return values. That way, you don't have to keep rewriting the same encoding conversion code over and over again.

Replies are listed 'Best First'.
Re^2: Encodings problem
by Nik (Initiate) on Oct 08, 2006 at 00:12 UTC
    Thank you very much.
    It would all be easier if we can just make bloody windows to use "utf8".
    If its possible and someone knows a way to actually implement this please let me know otherwise i will leave it as it is.
      If its possible and someone knows a way to actually implement this please let me know

      Here's the sort of thing I had in mind -- it's limited but simple, and will trap the most likely problems (but you'll need to figure out what to do in your cgi application when those problems come up). I haven't tested it, except to confirm that it compiles, and to make sure that this sort of operation works as hoped for (at least, it did on macosx):

      my_open( FH, ">", "foo.bar" ) or die "foo.bar: $!"; #... sub my_open { my ( $fh, $mode, $name ) = @_; open( $fh, $mode, $name ); }
      Unfortunately, if the caller tries to pass a lexically scoped scalar as the filehandle arg, that doesn't work. There's a way around that, but I haven't tried to look it up. (Maybe other monks know how off the top of their heads.) Since the OP code appears to be using the old UPPERCASE style file handles, the module as provided should do okay.

      To work this into your cgi apps, store the code as "GreekFile.pm" in one of the @INC paths, and edit your cgi scripts that do file i/o so they include:

      use GreekFile qw/gr_open gr_opendir gr_readdir gr_glob/; # or just the relevant subset of these functions
      Then, wherever you have  open( FH, "<$filename" ) simply change that to  gr_open( FH, "<", $filename ) assuming that $filename is a utf8 string. Similarly for opendir, readdir and glob calls. Just use utf8 strings in your app -- all the conversion to and from CP1253 for file names is handled inside this module.

        thank you graff but this stuff is a little complicated for me, cause i ahve never sued a perl moule in the past and calls to it.

        Except that i dont like the idea of us programmers do an extra work to tell the WinXP OS how to treat our filenames and contents.

        What i have in my head is to find an OS option(maybe a registry option) that will tell stupid windows to actully treat the filenames in the same manner as it treats the file contents.
      Well, I have been using Linux for a while now, but it seemed to me that last time I booted it, it used utf8 as default. I may be mistaken, but I used without any problem japanese, chinese and french filenames.

      You may yet find this page interesting.