You are correct that my code fails on your test case, however that is out of spec for the OP. In particular, note that no URI either starts or ends the file and they are all separated from surrounding text by non-newline white space. I designed my regex to match identically to the original, since I don't know the real source of the data for comparison. I agree that the final
\s is potentially problematic (and updated accordingly), but the OP has it in there. Was it because he didn't realize
\S* is greedy? Probably, but how I am to know that? The provided file is highly unlikely to produce the desired result, since
http://www.website1.com/getme.html is not present anywhere in the file. Without knowing the actual data source/file format, any answer given here can fail. What if the OP also wants https: to match? ftp:? Best solution I've seen is in
CountZero's
comment, since
Regexp::Common should give some resilience, but it fails your test as well.