in reply to Parsing out URLs with regex

update but of course be sure to read tye's crit of halley's got-there-first version of this, benn's late entry...:)

As artist implies, there are more well-thought-out solutions to this particular problem than you can shake a stick at.

*If*, however, you're doing this purely as an learning exercise {g}, then you're correct - "\w" matches *only* alphanumerics and '_' - "http://" will throw it for instance. "." matches anything...you could use that, or a character class ('[\w:\/-]*' or something) to match only the characters that you want.

If you use "." though, be warned that "*" is 'greedy' - if your page contains more than one '" TITLE=""><b>Click Here' , then ".*" will grab a whole lot more than you bargained for...you'll probably want to make it ".*?" - the "?" makes it 'minimal'.

Hope this clarifies things,
Cheers, Ben.