in reply to Having hyperlinks in comments

URI::Find - Find URIs in arbitrary text
URI::Find::Rule - Simpler interface to URI::Find

Replies are listed 'Best First'.
Re^2: Having hyperlinks in comments
by tlm (Prior) on Jun 25, 2005 at 11:33 UTC

    I was going to propose, as one more alternative, Regexp::Common, but when I did quick test of it, I discovered that it gives somewhat wrong undesired results with some URLs (note the trailing non-URL characters parentheses, commas, semicolons, etc. in some of returned URLs):

    % wget -qO - http://www.ebay.com | perl -MRegexp::Common=URI -wnle 'print $1 while /($RE{URI}{HTTP})/g'|h +ead http://include.ebaystatic.com/js/v/us/homepage.js http://include.ebaystatic.com/aw/pics/us/css/homepage.css http://pics.ebaystatic.com/aw/pics/userSitePrefs/bottomDropShadow_20x2 +0.gif) http://pics.ebaystatic.com/aw/pics/userSitePrefs/sideDropShadow_20x20. +gif) http://pics.ebaystatic.com/aw/pics/userSitePrefs/dropshadow2_20x10.gif +) http://include.ebaystatic.com/aw/pics/css/ebay.css http://include.ebaystatic.com/'; http://include.ebaystatic.com/js/v/us/ebaybase.js http://include.ebaystatic.com/js/v/us/ebaysup.js http://search.ebay.com/',
    ...while URI::Find::Rule does a better job DWIM:
    % wget -qO - http://www.ebay.com | perl -MURI::Find::Rule -wlne ' print $_->[1] for URI::Find::Rule->scheme("http")->in($_)'|head http://include.ebaystatic.com/js/v/us/homepage.js http://include.ebaystatic.com/aw/pics/us/css/homepage.css http://pics.ebaystatic.com/aw/pics/userSitePrefs/bottomDropShadow_20x2 +0.gif http://pics.ebaystatic.com/aw/pics/userSitePrefs/sideDropShadow_20x20. +gif http://pics.ebaystatic.com/aw/pics/userSitePrefs/dropshadow2_20x10.gif http://include.ebaystatic.com/aw/pics/css/ebay.css http://include.ebaystatic.com/ http://include.ebaystatic.com/js/v/us/ebaybase.js http://include.ebaystatic.com/js/v/us/ebaysup.js http://search.ebay.com/

    Update: Fixed the incorrect wording. As merlyn pointed out, the unwanted trailing characters are valid URL characters. Still I think they could be a problem in the case of the application the OP described. Therefore, in this case, R::C is not the most straighforward solution.

    the lowliest monk