in reply to Regex for Multibyte Characters
Whatever character set is appropriate, it might be easiest to save a plain text file containing just the "CHT string" (or both the preceding and the following "CHT string", if these are not identical), in the same character set as the original data. Then something like this should do:
(not tested, of course, but nothing much to it, really)use strict; use Encode; open( I, "string.txt" ); my $string = <I>; chomp $string; # or: my ($pre,$fol) = split ' ',$string; # if file has previous and following strings close I; my $pattern = decode( 'big5-eten', $string ); # you might need a different character-set name (if so, fix it in thre +e places) # also, if you are using $pre and $fol, you need to decode each one se +parately # (e.g. into $pat1 and $pat2) my $newversion = "2.0"; # or whatever... open( I, "<:encoding(big5-eten)", "big_data.txt" ); binmode( STDOUT, ":encoding(big5-eten" ); while ( <I> ) { s/($pattern).*?($pattern)/$1$newversion$2/; # or: s/($pat1).*?($pat2)/$1$newversion$2/; # maybe you also need the "g" modifier too? print; }
|
|---|