pax77 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I need to compare two string (one in korean, one in hex format): I have a string in a config file (korean):  player=리작 and a hex value in an data file:  'm_name': '\xeb\xa6\xac\xec\x9e\x91', If I compare them 'manualy' they are identical
my $a = $cfg{'default.player'}; my $b = "\xeb\xa6\xac\xec\x9e\x91"; if ($a eq $b) { print "indahouse\n"; }
But if I read in the hex value from the data file they are different :(:
open(ST, "<", $details) or die "Could not read $details: $!\n"; while (<ST>) { if (/m_name/) { if (/([\\\w]*)',$/) { $name = $1; print Dumper($name); if ($name eq $cfg{'default.player'}) { print "indahouse\n"; } } } }
The dumper output for $name is:  $VAR1 = '\\xeb\\xa6\\xac\\xec\\x9e\\x91'; so it is not recogniced as hex value. I can't find how to convert this string :( Any help appreciated! Regards, pax77

Replies are listed 'Best First'.
Re: Comparing string hex / korean
by choroba (Cardinal) on Jan 04, 2019 at 14:37 UTC
    I saved an UTF-8 encoded file with the following contents:
    리작
    \xeb\xa6\xac\xec\x9e\x91
    

    You can clearly see the lines are different. You can easily replace the codes with their corresponding bytes:

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; chomp( my $korean = <> ); chomp( my $hex = <> ); $hex =~ s/\\x(..)/chr hex $1/ge; say $korean eq $hex ? 'Same' : 'Different';

    Or replace the encoded korean characters by their hex equivalents by

    $korean =~ s/(.)/'\\x' . sprintf '%x', ord $1/ge;

    If you need to work with UTF-8 encoded files, you should open them with the :encoding(UTF-8) layer. In such a case, you need to encode the string before further processing it:

    utf8::encode($korean);

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Comparing string hex / korean (updated)
by haukex (Archbishop) on Jan 04, 2019 at 14:38 UTC

    Assuming the hex string isrepresents UTF-8, you could do this:

    use warnings; use strict; use open qw/:std :utf8/; use HTML::Entities qw/decode_entities/; my $x = '&#47532;&#51089;'; my $xd = decode_entities($x); my $y = '\\xeb\\xa6\\xac\\xec\\x9e\\x91'; (my $yd = $y) =~ s/\\x([0-9a-f]{2})/chr hex $1/eig; utf8::decode($yd); print "$x =>$xd<=\n"; print "$y =>$yd<=\n"; if ( $xd eq $yd ) { print "Match!\n" } else { die "Mismatch!\n" }

    Output:

    &#47532;&#51089; =>리작<=
    \xeb\xa6\xac\xec\x9e\x91 =>리작<=
    Match!
    

    Update: I realized it's unclear to me whether your original string was player=리작 or player=&#47532;&#51089; - the latter happens on PerlMonks when you put Unicode characters into <code> tags. You'd have to use <pre> or <tt> tags instead, but then characters that have special meanings (either as HTML or by the PerlMonks engine) have to be escaped, such as <=&gt;, &=&amp;, [=&#91;, etc. If your original string was player=리작, then you don't need the decode_entities step - but it is necessary for the input string in $cfg{'default.player'} to have been properly decoded first though (you can check this e.g. with Devel::Peek, see also).

Re: Comparing string hex / korean
by BillKSmith (Monsignor) on Jan 04, 2019 at 15:53 UTC
    The code below implements the conversion you requested. However, I would still recommend the true UTF-8 solution as recommended by other monks.
    use strict; use warnings; my %cfg = ( 'default.player', "\xeb\xa6\xac\xec\x9e\x91", ); my $name = '\\xeb\\xa6\\xac\\xec\\x9e\\x91'; my $a = $cfg{'default.player'}; my $b = eval q(") . $name . q("); if ($a eq $b) { print "indahouse\n"; }
    Bill
Re: Comparing string hex / korean
by daxim (Curate) on Jan 04, 2019 at 16:16 UTC
    I can't find how to convert this string :(
    use String::Unescape qw(); String::Unescape->unescape('\\xeb\\xa6\\xac\\xec\\x9e\\x91') # expression returns "\xeb\xa6\xac\xec\x9e\x91"