HTML::Parser??

bleekbob has asked for the wisdom of the Perl Monks concerning the following question:

I'm wondering if the HTML:Parser mod will accomplish what i need to do or if there is a better way. I need to read from an html file, find a certain tag with a certain attribute and extract and replace everything between that tag (that is between the opening and closing tag be it text or more html)... and I may need to do it sa couple times within this page. Can this be accomplished using HTML::Parser, or is there a better way? I've done this with "less mature" scripting languages but my huntch is that perl can do it faster. Please advise o wise ones. Thanks

Comment on HTML::Parser??

Replies are listed 'Best First'.
Re: HTML::Parser?? by PodMaster (Abbot) on Aug 17, 2002 at 08:48 UTC
No need to wonder anymore, yes, HTML::Parser will help you accomplish what you're doing. DESCRIPTION Objects of the "HTML::Parser" class will recognize markup and separate it from plain text (alias data content) in HTML documents. As different kinds of markup and text are recognized, the corresponding event handlers are invoked. "HTML::Parser" in not a generic SGML parser. We have tried to make it able to deal with the HTML that is actually "out there", and it normally parses as closely as possible to the way the popular web browsers do it instead of strictly following one of the many HTML specifications from W3C. Where there is disagreement there is often an option that you can enable to get the official behaviour. The document to be parsed may be supplied in arbitrary chunks. This makes on-the-fly parsing as documents are received from the network possible. If event driven parsing does not feel right for your application, you might want to use "HTML::PullParser". It is a "HTML::Parser" subclass that allows a more conventional program structure. If you have no idea how I got that description, please read this friendly guide on perl documentation and resources. There is a better way, and it's called HTML::TokeParser (see Tutorials for a tutorial). `____________________________________________________` ** The Third rule of perl club is a statement of fact: pod is sexy.	[reply]
Re: Re: HTML::Parser?? by bleekbob (Initiate) on Aug 20, 2002 at 07:08 UTC
Ok, great.. any chance you would like to initialize me on the use of the HTML::TokeParser?	[reply]
Re: Re: Re: HTML::Parser?? by bleekbob (Initiate) on Aug 22, 2002 at 17:31 UTC
oh yeah, the tutorial.. thanks hommie	[reply]
Re: HTML::Parser?? by simon.proctor (Vicar) on Aug 17, 2002 at 17:43 UTC
You could also try HTML::TreeBuilder. This layers ontop of HTML::Parser and allows you to query your html doc as a tree. Just my 2p :)	[reply]