in reply to Getting rid of \x junk

I just posted this 2d ago:

" I ran across this string in a document: \xE2\x80\x94 I did tr/\x94/"/, and got no result. (Should have replaced \x94). What is the above code/string and how do I get rid of it? "
As long as word-processors keep getting "made better" this Q and situation will keep coming up.
As I said, I tried tr/\x94/"/ -- and got no result. THEN what does "\xE2\x80\x94" mean?
Has anyone written a program that removes all occurances of "\x*" from large documents?

Is there a document that tells how to fix/convert this?

Replies are listed 'Best First'.
Re: Re \x {} junk
by haukex (Archbishop) on Jan 30, 2021 at 15:34 UTC
Re: Re \x {} junk
by Polyglot (Chaplain) on Jan 30, 2021 at 15:08 UTC

    I think you might be looking for the solution others provided me just recently for almost the exact same issue. In my case, the trick turned out to be a two-step process. You will find the details to the solution here:

    Perl's encoding versus UTF8 octets

    The crux of it was this:

    $mytext =~ s!\\x(..)!chr(hex($1))!ge; my $newcode = decode('utf8', $mytext);

    Note the substitution followed by the decode.

    Blessings,

    ~Polyglot~