Yllar has asked for the wisdom of the Perl Monks concerning the following question:

I am working on Windows Environment, I want to trim all non-Ascii characters and want only ascii range characters,numbers and symbols.Please help

My Input Was :

This is a simple text just for test purpose only ascii text 12345678910-=[];'#/.,-! " £ $ % ^ & * ( ) _ + { }~@:<>?|–

Now I am using JSON to decode my input data which decodes it as follows:

This is a simple text just for test purpose only ascii text12345678910-=[];\'#/.,\\-!\"\u00A3$%^&*()_+{}~@:<>?| \u2013

Now I am sending this decoded data to my Program to replace this unicode(utf-8) and other non-ascii characters with space/or some printable characters(I mean i want to print only ascii range characters) So, I tried all of the following in perl.

use strict; use warnings; use JSON; use LWP::UserAgent; use utf8; #Due to some security reasons I am not mentioning the url,hope u under +stand my $ResRef = sendHTTPRequest($someurlRequest); my $string = $ResRef->decoded_content;#I used json decode to decode co +ntent my $string = transalte_replace($string); sub transalte_replace { my $string = shift; for($string) { s/\\u[0-9]+/1-/g; s/\\u[a-zA-Z0-9\+]*/2-/g; s/\\x\{[a-zA-Z0-9]*\}/3-/g; s/[^\p{ASCII}]/-/g; s/[^\u0000-\u007F]+/replace1/g; s/[^\x00-\x7F]+/rep/g; s/[^\p{ASCII}]/-/g; s/[^A-Za-z0-9\.,\?'""!@#\$%\^&\*\(\)-_=\+;:\<\>\/\\\|\}\{\[\]`\~ +]+/y/g; #s/[£]//g; s/[^\x20-\x7E]+/replace3/g; #s/\\u[0-9]+/2-/g; #s/\\x[a-z0-9]+/3-/g; #s/[^\x00-\x7F]/4-/g; } }

The output still is:

"This is a simple text just for test purpose only ascii text12345678910-=[];'#/.,\-!\"\x{a3}\$%^&*()_+{}~\@:?|\x{2013}";

Replies are listed 'Best First'.
Re: Regex to trim non Ascii characters
by trippledubs (Deacon) on Sep 26, 2015 at 11:09 UTC
Re: Regex to trim non Ascii characters
by Albannach (Monsignor) on Sep 26, 2015 at 18:02 UTC
    If all you want is a filter then perhaps consider tr, along the lines of tr/// vs s/// The question.

    --
    I'd like to be able to assign to an luser

Re: Regex to trim non Ascii characters
by Anonymous Monk on Sep 27, 2015 at 18:51 UTC

    You're calling the sub like so: $x = translate($x). But what does your subroutine return?

    Remember that my $x creates a fresh new variable. Operating on a local copy inside the subroutine does not change its arguments. Either you work on the passed value itself

    sub mutate { for (shift) { s/./x/g; } }
    or return the working copy
    sub translate { my $x = shift; $x =~ s/./x/g; return $x; }

    As for the ASCII, all the printable characters fall in a short range, so a simple tr/\040-\176//cd ought to do.

Re: Regex to trim non Ascii characters
by nikosv (Deacon) on Sep 27, 2015 at 17:45 UTC
    use re 'eval'; $a='12345678910-=[];\'#/.,\\-!\"\u00A3$%^&*()_+{}~@:<>?| \u2013'; $a=~ s/((\\u....)|(.))(?{ if (defined $2){'y'} elsif(defined $3) {'x'} + else{$1} })/$^R/xg; print $a;
    prints :
    xxxxxxxxxxxxxxxxxxxxxxxxxxyxxxxxxxxxxxxxxxxxxxy