Form validation

Flavia has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re (tilly) 1: Form validation by tilly (Archbishop) on Nov 17, 2000 at 05:32 UTC
Try URI::Find. It is a little tricky though: `use strict; use URI::Find; # Time passes sub test_uri { my $val = shift; my $copy = $val; my $found; find_uris($copy, sub { $found = shift; } ); return $val eq $found; }` [download] Note that this test is also rather picky...you may prefer to do the following which is both simpler and more useful IMO: `sub normalize_uri { my $found; find_uris($val, sub { $found = shift; } ); return $found; }` [download] If the text is even remotely acceptable, it will try to guess at something valid... :-)	[reply] [d/l] [select]
Re: Form validation by Flavia (Initiate) on Nov 17, 2000 at 10:59 UTC
Wow, I'll have some fun tonight! :) Thank you so much for all the help, guys. I believe my question has been answered. I'll be trying your sugestions now. Thanks again and very best, Flavia	[reply]
Re: Form validation by arturo (Vicar) on Nov 17, 2000 at 04:27 UTC
When I saw this post, I thought to myself that there must be a module that validates that a string has the right form to be a URL (without actually trying to use the string to connect). But I couldn't find one. If you're looking for a "good enough" solution, something like what cianoz proposes would be a start. It depends how nit-picky you're going to be. You might want to make sure there's a protocol ID in the string, and at least one period surrounded by other valid, non-period characters (although http://localhost is valid and won't fit that, so even that's not guaranteed). I'm not enough of a regex genius to make that happen (yet), and I don't have the RFC handy. So the following comes with heaps of disclaimers. Hopefully, it will get you started and won't damage your career as a Perl programmer or a human being =) `# since this is messy and we need to use it a bunch of times, let's lo +ad it into a scalar. my $val = "[a-z0-9?]+=\-%"; # there is no way in heck this is correct; + it's a beginning though # get input if (is_url($string)) { # do something with it } ... sub is_url { my $input = shift; $input =~ #(?:http(?:s)\|ftp)://$val+\.$val+#i; }` [download] HTH Philosophy can be made out of anything. Or less -- Jerry A. Fodor	[reply] [d/l]
Re: Form validation by cianoz (Friar) on Nov 17, 2000 at 04:11 UTC
are you trying to validate a URL or just a hostname? a URL has a LOT of valid non word characters (at least in certain positions... `[_-/:@?#+~.]` any other?) if your are trying to match just the hostname then this would do (tell me if i forgot something) `unless($URLcheck =~ /[A-Za-z\-\.0-9]+/) { #reject this.. }` [download] Update! i've found this on slashdot, i see a short life for my regexp :-)	[reply] [d/l] [select]
Re: Form validation by the_slycer (Chaplain) on Nov 17, 2000 at 04:25 UTC
Probably the better way to go here (because URLS can have many funky chars) is to find out what they can't have and regex on that instead. I am not very good with regexes but you could try something like `if ($urlcheck =~ m/[^\s]/){ #stuff to do bad url here }` [download] which would get rid of the ones that have a space.	[reply] [d/l]
Re: Re: Form validation by jeroenes (Priest) on Nov 17, 2000 at 18:18 UTC
Watch out, though! URL's are allowed to have spaces. I checked, and linux/apache/netscape don't have problems with the spaces. From RFC 1738: 2.2. URL Character Encoding Issues URLs are sequences of characters, i.e., letters, digits, and specia +l characters. A URLs may be represented in a variety of ways: e.g., i +nk on paper, or a sequence of octets in a coded character set. The interpretation of a URL depends only on the identity of the characters used. In most URL schemes, the sequences of characters in different parts of a URL are used to represent sequences of octets used in Internet protocols. For example, in the ftp scheme, the host name, directory name and file names are such sequences of octets, represented by parts of the URL. Within those parts, an octet may be represented +by the chararacter which has that octet as its code within the US-ASCI +I [20] coded character set. [download] and `Reserved: Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters "; +", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters m +ay be reserved within a scheme.` [download] and 3.1. Common Internet Scheme Syntax While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data: //<user>:<password>@<host>:<port>/<url-path> Some or all of the parts "<user>:<password>@", ":<password>", ":<port>", and "/<url-path>" may be excluded. The scheme specific data start with a double slash "//" to indicate that it complies wi +th the common Internet scheme syntax. The different components obey th +e [download] So, I think the url checking should be a little bit more sofisticated. I don't know how URI::Find implements the check. Cheers, Jeroen I was dreaming of guitarnotes that would irritate an executive kind of guy (FZ)	[reply] [d/l] [select]