in reply to Re^2: Encodings problem
in thread Encodings problem

If its possible and someone knows a way to actually implement this please let me know

Here's the sort of thing I had in mind -- it's limited but simple, and will trap the most likely problems (but you'll need to figure out what to do in your cgi application when those problems come up). I haven't tested it, except to confirm that it compiles, and to make sure that this sort of operation works as hoped for (at least, it did on macosx):

my_open( FH, ">", "foo.bar" ) or die "foo.bar: $!"; #... sub my_open { my ( $fh, $mode, $name ) = @_; open( $fh, $mode, $name ); }
Unfortunately, if the caller tries to pass a lexically scoped scalar as the filehandle arg, that doesn't work. There's a way around that, but I haven't tried to look it up. (Maybe other monks know how off the top of their heads.) Since the OP code appears to be using the old UPPERCASE style file handles, the module as provided should do okay.

To work this into your cgi apps, store the code as "GreekFile.pm" in one of the @INC paths, and edit your cgi scripts that do file i/o so they include:

use GreekFile qw/gr_open gr_opendir gr_readdir gr_glob/; # or just the relevant subset of these functions
Then, wherever you have  open( FH, "<$filename" ) simply change that to  gr_open( FH, "<", $filename ) assuming that $filename is a utf8 string. Similarly for opendir, readdir and glob calls. Just use utf8 strings in your app -- all the conversion to and from CP1253 for file names is handled inside this module.

package GreekFile; =head1 NAME GreekFile -- for transliterating Greek file names in MS-Windows =head1 SYNOPSIS gr_open( FILEHANDLE, $mode, $utf8name ); gr_opendir( DIRHANDLE, $utf8name ); $utf8_name = gr_readdir( DIRHANDLE ); @utf8_names = gr_readdir( DIRHANDLE ); @utf8_names = gr_glob( $utf8glob ); =head1 DESCRIPTION On a Windows system that uses single-byte CP1253 Greek characters (similar to ISO-8859-7) for naming files and directories, the functions provided by this module will allow a utf8-based application to work smoothly, by automatically converting file name strings between these two encodings as needed. This is presented as a "trial" or "proof-of-concept" version; it is limited in many ways, and does not support a lot of the flexibility of Perl's "open" and "opendir" functions. For example, it does not support the use of lexically-scoped scalar variables as file handles. The limitations could be fixed with some looking up in manuals... The gr_open and gr_opendir return the same success or failure values that the normal "open" and "opendir" functions would return. Likewise, gr_readdir behaves like normal readdir: it will return either a single file name or a list of file names, depending on whether it is called in a scalar or array context. The functions that take utf8 strings as input parameters (gr_open, gr_opendir and gr_glob), will do the conversion to CP1253 inside an eval block. If the conversion fails (either because the input string was not valid utf8, or because it contained valid characters that fall outside the CP1253 character set), they will return undef, and $! will contain an error message from the failure (i.e. the value of $@ that resulted from the failed eval). Error checking is not done on the file names that are read via readdir and glob. At worst, if a file name on disk contains single-byte characters that are not defined in the CP1253 character map, the conversion to utf8 will include "\x{FFFD}" for each such character. =cut use Exporter; use Encode qw(from_to); @ISA = qw(Exporter); @EXPORT_OK = qw(gr_open gr_opendir gr_readdir gr_glob); use strict; use warnings; sub gr_open { my ( $fh, $mode, $name ) = @_; eval { from_to( $name, "utf8", "cp1253", Encode::FB_CROAK ) }; if ( $@ ) { $! = $@; return; } open( $fh, $mode, $name ); } sub gr_opendir { my ( $dh, $name ) = @_; eval { from_to( $name, "utf8", "cp1253", Encode::FB_CROAK ) }; opendir( $dh, $name ); } sub gr_readdir { my ( @names, $name ); my ( $dh ) = @_; if ( wantarray ) { @names = readdir( DH ); from_to( $_, "cp1253", "utf8" ) for ( @names ); return @names; } else { $name = readdir( DH ); from_to( $name, "cp1253", "utf8" ); return $name; } } sub gr_glob { my ( $glb ) = @_; eval { from_to( $glb, "utf8", "cp1253", Encode::FB_CROAK ) }; if ( $@ ) { $! = $@; return; } my @names = glob( $glb ); from_to( $_, "cp1253", "utf8" ) for ( @names ); return @names; }

Replies are listed 'Best First'.
Re^4: Encodings problem
by Nik (Initiate) on Oct 09, 2006 at 07:48 UTC
    thank you graff but this stuff is a little complicated for me, cause i ahve never sued a perl moule in the past and calls to it.

    Except that i dont like the idea of us programmers do an extra work to tell the WinXP OS how to treat our filenames and contents.

    What i have in my head is to find an OS option(maybe a registry option) that will tell stupid windows to actully treat the filenames in the same manner as it treats the file contents.
      i have never used a perl moule in the past and calls to it

      Yes you have, and it's not that complicated, and get used to it. It's called "learning how to use a programming language to get things done".

      i dont like the idea of us programmers do an extra work to tell the WinXP OS how to treat our filenames and contents.

      Yeah, it would be swell if us programmers didn't have to do any extra work... we could spend more time outdoors. Oh well, get used to it.

      What i have in my head is to find an OS option (maybe a registry option) that will tell stupid windows to actully treat the filenames in the same manner as it treats the file contents.

      Sounds nice, good luck with that.

        I must thank you for you time and effort to help me though, i appreciate it but i just refuse to do all those extra things which you say(and iam sure they will work) to do soemthing the OS would *have to do* by default. I hope someone here knows how to change this option/registry setting or whatever.