in reply to Standardizng URLs
My advice is don't do it; or don't do it that way. WWWdot are the most ridiculous and unnecessary 10 syllables in English.
The tools you might want to look at to do what you want include but are not limited to: HTML::LinkExtor (extract links), URI::Find (find URIs in plain text), URI (deal with URIs properly, as objects), HTML::TreeBuilder (parse HTML to find attributes like href and src), XML::LibXML (same but different approach, works great with HTML with proper settings). See also: http://learn.perl.org/faq/perlfaq9.html#How_do_I_extract_URL
If you provide what you've tried already + maybe sample input and desired output, you'll likely get more concrete assistance.
|
|---|