in reply to Finding & creating links in HTML files
Cleaning up HTML is a task probably everyone at least once encounters, but this node has given me enough inspiration for HTML tidying to write an HTML-Cleaner in PHP (and it's hell working with regular expressions in this ugly language, especially when you know Perl).
Consider using that Code (+ a bit customisation) instead of trying some quick'n'dirty regular expressions that will fail, if not today, tomorrow.
As for the linking: you can build regular expressions for URIs using the neccessary RFCs, but that will result in very complex expressions that are much too accurate for just extracting "stuff that begins with 'http://' and/or 'www.' or 'ftp://' or maybe 'mailto:'". If you want an accurate solution anyways, check out Regexp::Common::URI ;)
--
|
|---|