in reply to Parsing out URLs with regex
As artist implies, there are more well-thought-out solutions to this particular problem than you can shake a stick at.
*If*, however, you're doing this purely as an learning exercise {g}, then you're correct - "\w" matches *only* alphanumerics and '_' - "http://" will throw it for instance. "." matches anything...you could use that, or a character class ('[\w:\/-]*' or something) to match only the characters that you want.
If you use "." though, be warned that "*" is 'greedy' - if your page contains more than one '" TITLE=""><b>Click Here' , then ".*" will grab a whole lot more than you bargained for...you'll probably want to make it ".*?" - the "?" makes it 'minimal'.
Hope this clarifies things,
Cheers, Ben.
|
|---|