Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need the canonical regexp to match urls beginning with http:// (I don't need to worry about ftp:, telnet: or mailto:, in other words) and though I don't want to roll my own, Google searches of the form

regexp url http

are, needless to say, hopeless.

I bow before on the collected wisdom of the perl monks.

-clay

  • Comment on Canonical regexp for urls beginning with http://

Replies are listed 'Best First'.
Re: Canonical regexp for urls beginning with http://
by Albannach (Monsignor) on Feb 22, 2001 at 02:57 UTC
    Maybe you should have googled a bit more. I got this excellent link on my first search using "regular expression url"

    Update Nice (more excellent) link indeed merlyn (thanks!), but just how is a newbie to think of abigail? ;-)

    --
    I'd like to be able to assign to an luser

      I don't think anyone mentioned the Find::URI module. I've never used it but I would hope it would either do a good job of this or you'd submit a patch for it based on other information in this thread. (:

      I like the one Albannach mentions above except that I'd be sure to add "," to the list of punctuation marks! My limited experience with URL-matching regexen shows that the trailing comma is the number one problem for them.

      I looked over the link merlyn gave and didn't see any regular expressions and was forced to actually start reading. (: The first line has a link to the regex that the discussed program generates. Somehow I cannot recommend the use of such a huge beast for the original requested purpose. Based on a very casual reading, I think that monster would actually match trailing punctation such as the period at the end of this sentence that also ends in a URL, http://www.perl.org/index.html.

              - tye (but my friends call me "Tye")
Re: Canonical regexp for urls beginning with http://
by Tuna (Friar) on Feb 22, 2001 at 02:46 UTC
    Given a file named "example"

    which contains the line:

    http://cisco.com You've got a match!

    this:
    #!/usr/local/bin/perl -w use strict; my $file = "/export/home/ssesar/example"; my $line; open (FILE, $file) || die "\n$!"; while ($line = <FILE>) { chomp $line; if ($line =~ m/^http\:\/\//) { print "line = $line\n"; } else { print "Sorry, dude\n"; } }
    works.
      As long as that's your only line. You could have just said:
      @ARGV = qw(/export/home/ssesar/example); print "line = $_" for <>;
      for the accuracy you give. See the other answers in this thread for things which actually detect valid URLs, and reject bad ones.

      -- Randal L. Schwartz, Perl hacker

        Thanks! Much easier!