ansh batra has asked for the wisdom of the Perl Monks concerning the following question:

hi monks
i have a line which contains x number of urls(where x=0 to more than 1).
i want to extract the urls and i dnt know how many number of urls are there.
i tried this
if($line =~ /www.*com/g) { print "$&\n"; }
this gives something like this
www.scvckj.com agdsvejdvvws xwsjwswj www.wdxvecbc.com swgvdec www.asdvedj.com
so what i want is it should collect all urls not the string strating with www and end with com. please help

Replies are listed 'Best First'.
Re: REGEX:extract url/urls from aline
by CountZero (Bishop) on Dec 28, 2011 at 10:27 UTC
    Your regex is unlikely to work as there is no reason at all an url must start with www or end with .com

    Have a look at Regexp::Common and Regexp::Common::URI to see how it should be done.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: REGEX:extract url/urls from aline
by Corion (Patriarch) on Dec 28, 2011 at 09:25 UTC

    Have you seen URI::Find?

    Otherwise, please post a short, self-contained program and representative input data. I suggest you review perlre about greedy repetitions. .* will always take up as much as it can while the regular expression still matches. Also, think about whether you really want to use the dot - it is unlikely that whitespace delimits an URI.

    Also, your usage of $1 and $2 is weird because you never capture any data.

Re: REGEX:extract url/urls from aline
by TJPride (Pilgrim) on Dec 28, 2011 at 19:16 UTC
    URL's have so many possible formats that this is one of the relatively few instances where I personally wouldn't even bother messing with regex. Use a code library, as suggested. But for your specific case:

    use strict; use warnings; $_ = 'www.scvckj.com agdsvejdvvws xwsjwswj www.wdxvecbc.com swgvdec ww +w.asdvedj.com'; @_ = m/www\..*?\.com/g; print join "\n", @_;

    Or if you want to add a few other suffixes:

    @_ = m/www\..*?\.(?:com|net|org|edu)/g;