in reply to Re^2: Encodings problem
in thread Encodings problem
Here's the sort of thing I had in mind -- it's limited but simple, and will trap the most likely problems (but you'll need to figure out what to do in your cgi application when those problems come up). I haven't tested it, except to confirm that it compiles, and to make sure that this sort of operation works as hoped for (at least, it did on macosx):
Unfortunately, if the caller tries to pass a lexically scoped scalar as the filehandle arg, that doesn't work. There's a way around that, but I haven't tried to look it up. (Maybe other monks know how off the top of their heads.) Since the OP code appears to be using the old UPPERCASE style file handles, the module as provided should do okay.my_open( FH, ">", "foo.bar" ) or die "foo.bar: $!"; #... sub my_open { my ( $fh, $mode, $name ) = @_; open( $fh, $mode, $name ); }
To work this into your cgi apps, store the code as "GreekFile.pm" in one of the @INC paths, and edit your cgi scripts that do file i/o so they include:
Then, wherever you have open( FH, "<$filename" ) simply change that to gr_open( FH, "<", $filename ) assuming that $filename is a utf8 string. Similarly for opendir, readdir and glob calls. Just use utf8 strings in your app -- all the conversion to and from CP1253 for file names is handled inside this module.use GreekFile qw/gr_open gr_opendir gr_readdir gr_glob/; # or just the relevant subset of these functions
package GreekFile; =head1 NAME GreekFile -- for transliterating Greek file names in MS-Windows =head1 SYNOPSIS gr_open( FILEHANDLE, $mode, $utf8name ); gr_opendir( DIRHANDLE, $utf8name ); $utf8_name = gr_readdir( DIRHANDLE ); @utf8_names = gr_readdir( DIRHANDLE ); @utf8_names = gr_glob( $utf8glob ); =head1 DESCRIPTION On a Windows system that uses single-byte CP1253 Greek characters (similar to ISO-8859-7) for naming files and directories, the functions provided by this module will allow a utf8-based application to work smoothly, by automatically converting file name strings between these two encodings as needed. This is presented as a "trial" or "proof-of-concept" version; it is limited in many ways, and does not support a lot of the flexibility of Perl's "open" and "opendir" functions. For example, it does not support the use of lexically-scoped scalar variables as file handles. The limitations could be fixed with some looking up in manuals... The gr_open and gr_opendir return the same success or failure values that the normal "open" and "opendir" functions would return. Likewise, gr_readdir behaves like normal readdir: it will return either a single file name or a list of file names, depending on whether it is called in a scalar or array context. The functions that take utf8 strings as input parameters (gr_open, gr_opendir and gr_glob), will do the conversion to CP1253 inside an eval block. If the conversion fails (either because the input string was not valid utf8, or because it contained valid characters that fall outside the CP1253 character set), they will return undef, and $! will contain an error message from the failure (i.e. the value of $@ that resulted from the failed eval). Error checking is not done on the file names that are read via readdir and glob. At worst, if a file name on disk contains single-byte characters that are not defined in the CP1253 character map, the conversion to utf8 will include "\x{FFFD}" for each such character. =cut use Exporter; use Encode qw(from_to); @ISA = qw(Exporter); @EXPORT_OK = qw(gr_open gr_opendir gr_readdir gr_glob); use strict; use warnings; sub gr_open { my ( $fh, $mode, $name ) = @_; eval { from_to( $name, "utf8", "cp1253", Encode::FB_CROAK ) }; if ( $@ ) { $! = $@; return; } open( $fh, $mode, $name ); } sub gr_opendir { my ( $dh, $name ) = @_; eval { from_to( $name, "utf8", "cp1253", Encode::FB_CROAK ) }; opendir( $dh, $name ); } sub gr_readdir { my ( @names, $name ); my ( $dh ) = @_; if ( wantarray ) { @names = readdir( DH ); from_to( $_, "cp1253", "utf8" ) for ( @names ); return @names; } else { $name = readdir( DH ); from_to( $name, "cp1253", "utf8" ); return $name; } } sub gr_glob { my ( $glb ) = @_; eval { from_to( $glb, "utf8", "cp1253", Encode::FB_CROAK ) }; if ( $@ ) { $! = $@; return; } my @names = glob( $glb ); from_to( $_, "cp1253", "utf8" ) for ( @names ); return @names; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Encodings problem
by Nik (Initiate) on Oct 09, 2006 at 07:48 UTC | |
by graff (Chancellor) on Oct 09, 2006 at 13:49 UTC | |
by Nik (Initiate) on Oct 09, 2006 at 14:31 UTC |