Canonical regexp for urls beginning with http://

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Canonical regexp for urls beginning with http:// by Albannach (Monsignor) on Feb 22, 2001 at 02:57 UTC
Maybe you should have googled a bit more. I got this excellent link on my first search using "regular expression url" Update Nice (more excellent) link indeed merlyn (thanks!), but just how is a newbie to think of abigail? ;-) -- I'd like to be able to assign to an luser	[reply]
Re: Re: Canonical regexp for urls beginning with http:// by merlyn (Sage) on Feb 22, 2001 at 03:01 UTC
Not quite. Add "abigail" into the mix, and you end up with this most excellent link. -- Randal L. Schwartz, Perl hacker	[reply]
(tye)Re: Canonical regexp for urls beginning with http:// by tye (Sage) on Feb 22, 2001 at 10:11 UTC
I don't think anyone mentioned the Find::URI module. I've never used it but I would hope it would either do a good job of this or you'd submit a patch for it based on other information in this thread. (: I like the one Albannach mentions above except that I'd be sure to add "," to the list of punctuation marks! My limited experience with URL-matching regexen shows that the trailing comma is the number one problem for them. I looked over the link merlyn gave and didn't see any regular expressions and was forced to actually start reading. (: The first line has a link to the regex that the discussed program generates. Somehow I cannot recommend the use of such a huge beast for the original requested purpose. Based on a very casual reading, I think that monster would actually match trailing punctation such as the period at the end of this sentence that also ends in a URL, http://www.perl.org/index.html. - tye (but my friends call me "Tye")	[reply]
Re: Canonical regexp for urls beginning with http:// by Tuna (Friar) on Feb 22, 2001 at 02:46 UTC
Given a file named "example" which contains the line: http://cisco.com You've got a match! this: `#!/usr/local/bin/perl -w use strict; my $file = "/export/home/ssesar/example"; my $line; open (FILE, $file) \|\| die "\n$!"; while ($line = <FILE>) { chomp $line; if ($line =~ m/^http\:\/\//) { print "line = $line\n"; } else { print "Sorry, dude\n"; } }` [download] works.	[reply] [d/l]
Re: Re: Canonical regexp for urls beginning with http:// by merlyn (Sage) on Feb 22, 2001 at 02:52 UTC
As long as that's your only line. You could have just said: `@ARGV = qw(/export/home/ssesar/example); print "line = $_" for <>;` [download] for the accuracy you give. See the other answers in this thread for things which actually detect valid URLs, and reject bad ones. -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Re: Re: Re: Canonical regexp for urls beginning with http:// by Tuna (Friar) on Feb 22, 2001 at 03:04 UTC
Thanks! Much easier!	[reply]