in reply to Continue reading regex to next line

if ( $LINE =~ m/(http:\S*\s)/ ) { print $LINE should be something like:  if ( $LINE =~ m/(http:\S+)/ ) { print $1

That will take care of the problem of grabbing more than the URL, but only if there is one URL per line and what follows the http: is always a URL. The problem with your original regex is that it "captured" the spaces after the URL. It also considered "http:" all on its lonesome a valid URL - somewhat improbable. You also were printing out the line, rather than the part you "captured".

To solve the problem of URL's across lines, is a bit more complicated. You would need to (a) cache each line where a URL start is found (b) have a mechanism to determine the difference between a URL terminated by an end of line and a URL terminated by a run of spaces on the following line. You seem to want to use spaces between the the last letter of the URL and the new line as your test, but I'm not sure that would be reliable - couldn't a URL just end when the line ended?

Best, beth

Update: further explanations of issues, including need to print out captured portion rather than whole line.

Replies are listed 'Best First'.
Re^2: Continue reading regex to next line
by learningperl01 (Beadle) on Mar 02, 2009 at 21:40 UTC
    Thanks for the quick reply. I've updated the regex but the output is exactly the same as in the original post.
      You missed the change of print $LINE to print $1