REGEX:extract url/urls from aline

ansh batra has asked for the wisdom of the Perl Monks concerning the following question:

hi monks
i have a line which contains x number of urls(where x=0 to more than 1).
i want to extract the urls and i dnt know how many number of urls are there.
i tried this

if($line =~ /www.*com/g) {
            print "$&\n";
        }
[download]

this gives something like this
www.scvckj.com agdsvejdvvws xwsjwswj www.wdxvecbc.com swgvdec www.asdvedj.com
so what i want is it should collect all urls not the string strating with www and end with com. please help

Comment on REGEX:extract url/urls from aline Download Code

Replies are listed 'Best First'.
Re: REGEX:extract url/urls from aline by CountZero (Bishop) on Dec 28, 2011 at 10:27 UTC
Your regex is unlikely to work as there is no reason at all an url must start with `www` or end with `.com` Have a look at Regexp::Common and Regexp::Common::URI to see how it should be done. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re: REGEX:extract url/urls from aline by Corion (Patriarch) on Dec 28, 2011 at 09:25 UTC
Have you seen URI::Find? Otherwise, please post a short, self-contained program and representative input data. I suggest you review perlre about greedy repetitions. `.*` will always take up as much as it can while the regular expression still matches. Also, think about whether you really want to use the dot - it is unlikely that whitespace delimits an URI. Also, your usage of `$1` and `$2` is weird because you never capture any data.	[reply] [d/l] [select]
Re: REGEX:extract url/urls from aline by TJPride (Pilgrim) on Dec 28, 2011 at 19:16 UTC
URL's have so many possible formats that this is one of the relatively few instances where I personally wouldn't even bother messing with regex. Use a code library, as suggested. But for your specific case: `use strict; use warnings; $_ = 'www.scvckj.com agdsvejdvvws xwsjwswj www.wdxvecbc.com swgvdec ww +w.asdvedj.com'; @_ = m/www\..?\.com/g; print join "\n", @_;` [download] Or if you want to add a few other suffixes: `@_ = m/www\..?\.(?:com\|net\|org\|edu)/g;`	[reply] [d/l] [select]