Re: Link Parser, something to be desired?


Don't ask to ask, just ask
	PerlMonks

Re: Link Parser, something to be desired?

by planetscape (Chancellor)

on May 29, 2009 at 23:43 UTC ( [id://766986]=note: print w/replies, xml )

Need Help??

in reply to Link Parser, something to be desired?

Hi! Welcome back!

First, don't use regexen to parse HTML. There are many nodes here on PM that will tell you why that's a Bad Idea™.

Instead, use something like WWW::Mechanize find_all_links() or HTML::TreeBuilder look_down() to find your links.

Second, had you done Google's advanced search against PerlMonks for "html remove link", you'd have found helpful nodes such as these:

Remove all html tag Except 'sup'
Regex: Strip <script> tags?
Simple link extraction tool
Parsing HTML files to recover data...
Simplify HTML programatically
Parsing HTML tags with regex
How do I remove a specific keyword from a HTML page
Extract and modify IMG SRC tags in an HTML document.

IMHO, Re: Regex: Strip <script> tags? looks quite promising. ;-)

Good luck!

HTH,

planetscape

Comment on Re: Link Parser, something to be desired? Select or Download Code

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://766986]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others pondering the Monastery: (2)

As of 2024-04-20 04:55 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found