bulrush has asked for the wisdom of the Perl Monks concerning the following question:

I have some text that might have accented characters which are over ascii decimal value 127. I'd like to take my string, and instead of displaying the odd character, change the odd character to a string which represents the hex value.

Let's say the % character has a decimal value > 127.

Input: 'Odd character %'

Output: 'Odd character \x82'

Is there a couple of lines of code I can use to do this? Thank you.

  • Comment on How to replace extended ascii ctrs with \xnn strings?

Replies are listed 'Best First'.
Re: How to replace extended ascii ctrs with \xnn strings?
by toolic (Bishop) on Dec 14, 2015 at 17:46 UTC
    I think if you replace 122 below with 127, it will work. I just used 122 to demonstrate with printable characters:
    use warnings; use strict; my $s = 'foo bar {}{}'; $s =~ s/(.)/(ord($1) > 122) ? sprintf '\\x%x', ord($1) : $1/ge; print "$s\n"; __END__ foo bar \x7b\x7d\x7b\x7d
      Wow, it works great. There were all kinds of hidden extended characters that I wasn't removing, no wonder it looked like shite! :) I'm filing this in my regex notes.
      Thank you! I will try the 2 examples I saw in a bit. Yes there could be multiple extended ascii characters in a given string.
Re: How to replace extended ascii ctrs with \xnn strings?
by BrowserUk (Patriarch) on Dec 14, 2015 at 17:48 UTC

    Like this?

    $s = join '', map chr(), 128 .. 255;; ## Build a string of hi-bit char +s. print $s;; ## display it
    ÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜø£Ø×ƒáíóúñѪº¿®¬½¼¡«»░▒▓│┤ÁÂÀ©╣║╗╝¢¥┐└┴┬├─┼ãÃ╚╔╩╦╠═╬¤ðÐÊËÈıÍÎÏ┘┌█▄¦Ì▀ÓßÔÒõÕµþÞÚÛÙýݯ´­±‗¾¶§÷¸°¨·¹³²■
    $s =~ s[([\x80-\xff])][ sprintf '\\x%02x', ord( $1 ) ]ge;; print $s;; \x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x +91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa +2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3 +\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\ +xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\x +d6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe +7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8 +\xf9\xfa\xfb\xfc\xfd\xfe\xff

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: How to replace extended ascii ctrs with \xnn strings?
by Anonymous Monk on Dec 14, 2015 at 21:03 UTC
    use Encode; my $s = join '', map chr(), 128 .. 255; print Encode::decode( 'ascii', $s, Encode::PERLQQ | Encode::LEAVE_SRC );