regex to detect any non digit and number

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have to loop over my array and drop out all items that have characters that are NOT a-zA-Z, digits, periods or colons (actually I'm trying to ensure that the URL doesn't have any 'funky' chars in it that it shouldn't have.

Anyone know how to write this regex?

Comment on regex to detect any non digit and number

Replies are listed 'Best First'.
Re: regex to detect any non digit and number by brian_d_foy (Abbot) on Oct 16, 2006 at 23:54 UTC
It sounds as if you want URI::Escape (and maybe even Encode. Those should take care of the characters in the URL for you. Remember that URIs will also need `[@/+?&;=%$,]` (the reserved chars) and `[-_.!~*'()]` (the unreserved set), as well as a few odds-and-ends chars. For the full specification, see RFC 2396. -- brian d foy <brian@stonehenge.com> Subscribe to The Perl Review	[reply] [d/l] [select]
Re: regex to detect any non digit and number by chargrill (Parson) on Oct 16, 2006 at 22:16 UTC
Sure, lots of people do. Though we'd be more inclined to help if we see what you've tried so far, what works, what doesn't work, some sample input data, etc. Take a look at How (Not) To Ask A Question for more details. --chargrill `s*lil; $=join'',sort split q; s;.;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$,$/)` [download]	[reply] [d/l]
Re: regex to detect any non digit and number by GrandFather (Saint) on Oct 16, 2006 at 22:18 UTC
First thing to do is to read about regexen. They are really important for most things you do with Perl. I'd recommend that you start with perlretut, perlrequick, perlre and perlreref. When you have done that you will know that a negated character class like `[^a-zA-Z\d.:]` is what you are after. But read the documentation first. DWIM is Perl's answer to Gödel	[reply] [d/l]
Re^2: regex to detect any non digit and number by blazar (Canon) on Oct 16, 2006 at 22:46 UTC
When you have done that you will know that a negated character class like `[^a-zA-Z\d.:]` is what you are after. But read the documentation first. As a (IMHO extremely interesting) side note re `\d`, the latter may match way much more than one could naively expect, depending on locale: for more info please read news:efu613.19k.1@news.isolution.nl news:slrnei51l9.3hq.hjp-usenet2@yoyo.hjp.at	[reply] [d/l] [select]
Re^3: regex to detect any non digit and number by blazar (Canon) on Oct 17, 2006 at 15:13 UTC
news:efu613.19k.1@news.isolution.nl news:slrnei51l9.3hq.hjp-usenet2@yoyo.hjp.at Premise In reply to a `/msg` by GrandFather, I'll point out that the above links are not "broken", but refer to USENET urls and not everybody may have a news client installed nor a system configured to launch it on such urls, thus for ease of use I'll give GG urls for these clpmisc posts: http://groups.google.it/group/comp.lang.perl.misc/msg/94b1318aaf529116 http://groups.google.it/group/comp.lang.perl.misc/msg/eefc7fc940e4074e Summary The whole thread started at this post. To sum up the story, someone asked something about some Perl code he's seen, which included `\d`. So someone else explained that (to quote literally) \d matches "0", "1" ... "8" or "9" At this point yet another poster answered that Last time I checked, \d matched 268 different characters. Dear programmer, if you mean 0-9, then write 0-9. This spawned a sub discussion, because a fourth poster, and very well known contributor to the group pointed out that he while was aware that `\w` will match not only `'a'..'z', 'A'..'Z', '0'..'9',` and `'_'`, but possibly much more, depending on locale, it was not just obvious to him that `\d` will match anything but `[0-9]`. It was not obvious to me either, especially since I hardly know anything about this whole locales stuff, and that's why I'm reporting it here. Further replies included two test/example scripts, which I'm pasting hereafter, unmodified. The first script Read more... (4 kB) The second script Read more... (10 kB) Conclusion Read more... (8 kB) Edit: g0n - readmore tags	[reply] [d/l] [select]
Re: regex to detect any non digit and number by ikegami (Patriarch) on Oct 16, 2006 at 22:21 UTC
You're short many valid characters, but here goes: `@ok = grep !/[^a-zA-Z0-9.:]/, @list;` [download] or `@ok = grep /^[a-zA-Z0-9.:]*\z/, @list;` [download]	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom