convert several two digit hex characters to ascii

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: convert several two digit hex characters to ascii by kyle (Abbot) on Dec 01, 2008 at 22:09 UTC
You should find hex and chr useful as well as the `/e` option applied to s.	[reply] [d/l]
Re: convert several two digit hex characters to ascii by ikegami (Patriarch) on Dec 01, 2008 at 22:28 UTC
`perl -pe"s/0x((?:[0-9a-fA-F]{2})+)/pack 'H', $1/ge" file` [download] The above results in code equivalent to `while (<>) { s/0x((?:[0-9a-fA-F]{2})+)/pack 'H', $1/ge; print; }` [download] Update: Removed redundant and/or incorrect `unpack 'A*'`	[reply] [d/l] [select]
Re^2: convert several two digit hex characters to ascii by gone2015 (Deacon) on Dec 02, 2008 at 16:46 UTC
Isn't the `unpack 'A'` redundant ? (But wouldn't `'a'` be better ?) I can see the logic that what `pack` produces should really be `unpack`ed before being used. Indeed, it occurred to me that `unpack 'a',`... might do something bright with UTF-8. Which set me on a small quest to discover how to convert UTF-8 in hex characters to utf8 characters.... The following: `use strict ; use warnings ; use Encode qw(_utf8_on) ; for my $r ("\xC2\xAB \x61\x68\x61 \xC2\xBB", "\xC2\x7E \x61\x68\x61 +\x80\xC0") { for my $utf (0..1) { _utf8_on($r) if $utf ; printf "'%s', %d/%d %s\n", raw(unpack('a', $r)) ; } ; } ; sub raw { my ($s) = @_ ; my ($b, $q) ; { use bytes ; $b = length($s) ; $q = join '', map { ($_ >= 0x20) && ($_ <= 0x7E) ? chr($_) : spr +intf('\\x%02X', $_) } unpack('C', $s) ; } ; return ($q, length($s), $b, utf8::is_utf8($s) ? 'utf8' : 'not utf8 +') ; } ;` [download] gives: '\xC2\xAB aha \xC2\xBB', 9/9 not utf8 '\xC2\xAB aha \xC2\xBB', 7/9 utf8 '\xC2~ aha \x80\xC0', 9/9 not utf8 Malformed UTF-8 string in unpack at ... showing that if the string being unpacked is utf8, the result is utf8 (or error, if not valid utf8). I found, however, that `pack 'H',`... returns a byte (not utf8) string, no matter what the input(s). This seems, on the whole, reasonable. I tried a number of things to try to get `unpack('a', pack('H', $foo))` to return utf8, ... `my $s = "C2AB2061686120C2BB" ; _utf8_on($s) ; for my $unp ('a', 'U0a', 'C0a') { my ($q, $b, $l, $u) = raw(unpack($unp, pack('H', $s))) ; print "unpack('$unp', pack('H', \$s)) -> '$q', $l/$b $u\n" ; } ;` [download] but to no avail: unpack('a', pack('H', $s)) -> '\xC2\xAB aha \xC2\xBB', 9/9 not utf8 unpack('U0a', pack('H', $s)) -> '\xC3\x82\xC2\xAB aha \xC3\x82\xC2\xBB', 13/13 not utf8 unpack('C0a', pack('H', $s)) -> '\xC2\xAB aha \xC2\xBB', 9/9 not utf8 but note that `unpack 'U0a'` is "upgrading" (as in `utf8::upgrade()`) the bytes to UTF-8. I found that the trick is to tell `pack` to return utf8, thus: `my $s = "C2AB2061686120C2BB" ; for my $unp ('a', 'U0a', 'C0a') { printf "unpack('$unp', pack('U0H', $s)) -> '%s', %d/%d %s\n", raw(unpack( $unp, pack('U0H', $s))) ; } ;` [download] giving: unpack('a', pack('U0H', C2AB2061686120C2BB)) -> '\xC2\xAB aha \xC2\xBB', 7/9 utf8 unpack('U0a', pack('U0H', C2AB2061686120C2BB)) -> '\xC2\xAB aha \xC2\xBB', 9/9 not utf8 unpack('C0a', pack('U0H', C2AB2061686120C2BB)) -> '\xC2\xAB aha \xC2\xBB', 7/9 utf8 noting that `unpack 'U0a'` is treating its input as bytes. The `unpack` is still optional, though invalid UTF-8 is treated differently if it's left out, thus: `for my $s ("C2AB2041686120C2BB", "C27E204168612080C0") { printf "pack('U0H', $s) -> '%s', %d/%d %s\n", raw(pack('U0H', $s)) ; printf "unpack('a', pack('U0H', $s)) -> '%s', %d/%d %s\n", raw(unpack('a', pack('U0H', $s))) ; } ;` [download] gives: pack('U0H', C2AB2041686120C2BB) -> '\xC2\xAB Aha \xC2\xBB', 7/9 utf8 unpack('a', pack('U0H', C2AB2041686120C2BB)) -> '\xC2\xAB Aha \xC2\xBB', 7/9 utf8 Malformed UTF-8 character (unexpected end of string) in length at ../hex-utf.pl line 23. pack('U0H', C27E204168612080C0) -> '\xC2~ Aha \x80\xC0', 7/9 utf8 Malformed UTF-8 string in unpack at ../hex-utf.pl line 48. so `pack` is not checking for valid UTF-8, leaving it as a puzzle for others -- and in this case `length()` is throwing a warning. On the other hand, `unpack` is deeply unhappy about invalid UTF-8, and throws an error. None of this was entirely obvious to me. Hopefully somebody can benefit from my little quest. Returning to the topic of the OP, if I wanted to decode the hex as UTF-8, I think what I would do is: `sub dehex { my ($s) = @_ ; $s =~ s/0[xX]((?:[0-9A-Fa-f]{2})+)/pack('U0H*', $1)/eg ; return $s if utf8::valid($s) ; ... worry ... return undef ?? } ;` [download]	[reply] [d/l] [select]
Re^3: convert several two digit hex characters to ascii by ikegami (Patriarch) on Dec 02, 2008 at 19:50 UTC
I found that the trick is to tell pack to return utf8, thus: Depending on what you want, the following tools are probably more appropriate: `utf8::decode` will decode UTF-8 into characters. `utf8::upgrade` will turn convert the internal representation to UTF-8. The behaviour of a few tools (such as uc and `/\w/`) varies based on the internal encoding. Both are documented in utf8, but it's not necessary to do `use utf8;`. In fact, that means something different.	[reply] [d/l] [select]
Re: convert several two digit hex characters to ascii by Illuminatus (Curate) on Dec 01, 2008 at 22:12 UTC
How do you tell when the hex-data ends? In your example, the 'b' in 'blah' would be valid as a hex nibble.	[reply]
Re: convert several two digit hex characters to ascii by toolic (Bishop) on Dec 01, 2008 at 22:20 UTC
Here's some code that gets you close to what you want. It'll get you started... `use strict; use warnings; while (<DATA>) { if (/0x([[:xdigit:]]+)/) { my $hexstr = $1; my $word; while ($hexstr =~ /(..)/g) {$word .= chr hex $1} s/0x$hexstr/$word/; } print; } __DATA__ blah0x4445434C41524520405420blah blah foo zoo 75858` [download] prints out: `blahDECLARE @T lah blah foo zoo 75858` [download] You'll need to figure out how to deal with the 2nd "blah", whose 1st letter is a valid hex digit.	[reply] [d/l] [select]
Re^2: convert several two digit hex characters to ascii by ikegami (Patriarch) on Dec 01, 2008 at 22:33 UTC
You'll need to figure out how to deal with the 2nd "blah", whose 1st letter is a valid hex digit. He already has. He specified the nibbles are to be taken "two at a time". He still has the problem of distinguishing "foo(0x1234)bar" from "foo(0x1234ba)r", though. By the way, I think your code handles "blah 0x4142 blah 0x4142 blah" oddly, but I'm pleased with its handling of "x0x0x kisses and hugs x0x0x".	[reply]
Re: convert several two digit hex characters to ascii by eye (Chaplain) on Dec 01, 2008 at 22:46 UTC
Let's not forget that pack/unpack are another good way to do this sort of string mangling: `#!/usr/bin/perl $target = "blah0x4445434C41524520405420blah blah"; $target =~ s/0x(([0-9a-f][0-9a-f])+)/pack('H*', $1)/ie; print $target, "\n";` [download]	[reply] [d/l]