punitpawar has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
Wanted to know if there is any generic way using the replacement operator to get rid of special characters or numbers from a url ?
for eg :
$str="www.abc%20.com"
I want to get rid of %20 from the url , so the url should be
www.abc.com
I tried doing
$str = s/%20//;
this worked for me. But I want a more generic approach.

So I did something like this below, but that got rid of '.' as well.
$str =~ s/(\d+|\W+)//g;
output
wwwabccom
Is there a way I can add an exception in my replace command that will say replace all special characters excetp '.' ?

Replies are listed 'Best First'.
Re: Replacing special characters from a URL
by 1nickt (Canon) on Feb 17, 2016 at 19:19 UTC

    Hi punitpawar,

    You may not need to create a pattern by hand. In Perl, usually a tool has already been created for any common task you need to accomplish. Please see URI::Escape.

    I don't know how your URL domain name got spaces in it, but you might be able to use URI::Escape::uri_unescape either before this point in your program, or here, to help you keep/get them out:

    perl -MURI::Escape -E ' my $str = "www.abc%20.com"; say $str; $str = URI::Escape::uri_unescape( $str ); say $str; $str =~ s/\s//g; say $str; '
    Output:
    www.abc%20.com www.abc .com www.abc.com
    Hope this helps!


    The way forward always starts with a minimal test.
Re: Replacing special characters from a URL
by choroba (Cardinal) on Feb 17, 2016 at 18:45 UTC
    Please, use <code>...</code> tags for better readability.

    \W is non-word, so it's equivalent to [^\w] . Just add the dot there:

    $str =~ s/[^\w.]|\d//g;

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Great !!! Thanks a lot !!!
Re: Replacing special characters from a URL
by kennethk (Abbot) on Feb 17, 2016 at 22:12 UTC
    1nickt is right about using URI::Escape. So, you should do that. Also, this sounds a lot like an XY Problem, so why are you trying to do this? In any case, if you wanted to hand-roll a decoding for Percent encoding, you could do it with:
    s/%([0-9a-f]{2})/chr hex $1/ieg;
    Documentation: chr, hex, Modifiers from perlre.

    Note that percent encoding is in hex, not decimal, so you probably want to include A-F in some way.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Replacing special characters from a URL
by Your Mother (Archbishop) on Feb 17, 2016 at 19:19 UTC

    %20 is a valid URI part in most of the URI so it sounds like you have broken data. There is no generic substitution or treatment that can fix it unless the breakage is regular and predictable which is impossible to guess from your pseudo-sample. With more information about what your data really looks like, you'll likely get better suggestions.