in reply to Search and replace in Unicode files

I never saw a UTF-16 file. A better way to replace might IMHO be to have all your conversions, as you already have, in a hash. But now not as a hexdump but as "bytesequences". If I'm not wrong, each UTF character is a 2-byte sequence.

Example:

my %conv = ( "\0x00\0x3d" => 'key1', );
The replacement might then be done by:
$search=join '|', map quotemeta,keys %conv; while (<>) { s/($search)/$conv{$1}/geo; }
please correct me anyone who sees I'm wrong.

Replies are listed 'Best First'.
Re: Re: Search and replace in Unicode files
by bm (Hermit) on Jun 16, 2003 at 16:11 UTC
    I'm pretty sure the problem is with my outputting my results, not the conversion method (see my follow up post below).

    Having said, your method looks to be better (certainly quicker) than pattern matching on a list of hex values, which is what mine code above does.

    Appreciate your response