bleekbob has asked for the wisdom of the Perl Monks concerning the following question:

I'm wondering if the HTML:Parser mod will accomplish what i need to do or if there is a better way. I need to read from an html file, find a certain tag with a certain attribute and extract and replace everything between that tag (that is between the opening and closing tag be it text or more html)... and I may need to do it sa couple times within this page. Can this be accomplished using HTML::Parser, or is there a better way? I've done this with "less mature" scripting languages but my huntch is that perl can do it faster. Please advise o wise ones. Thanks

Replies are listed 'Best First'.
Re: HTML::Parser??
by PodMaster (Abbot) on Aug 17, 2002 at 08:48 UTC
    No need to wonder anymore, yes, HTML::Parser will help you accomplish what you're doing.
    DESCRIPTION
        Objects of the "HTML::Parser" class will recognize markup and separate
        it from plain text (alias data content) in HTML documents. As different
        kinds of markup and text are recognized, the corresponding event
        handlers are invoked.
    
        "HTML::Parser" in not a generic SGML parser. We have tried to make it
        able to deal with the HTML that is actually "out there", and it normally
        parses as closely as possible to the way the popular web browsers do it
        instead of strictly following one of the many HTML specifications from
        W3C. Where there is disagreement there is often an option that you can
        enable to get the official behaviour.
    
        The document to be parsed may be supplied in arbitrary chunks. This
        makes on-the-fly parsing as documents are received from the network
        possible.
    
        If event driven parsing does not feel right for your application, you
        might want to use "HTML::PullParser". It is a "HTML::Parser" subclass
        that allows a more conventional program structure.
    
    If you have no idea how I got that description, please read this friendly guide on perl documentation and resources.

    There is a better way, and it's called HTML::TokeParser (see Tutorials for a tutorial).

    ____________________________________________________
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      Ok, great.. any chance you would like to initialize me on the use of the HTML::TokeParser?
        oh yeah, the tutorial.. thanks hommie
Re: HTML::Parser??
by simon.proctor (Vicar) on Aug 17, 2002 at 17:43 UTC
    You could also try HTML::TreeBuilder. This layers ontop of HTML::Parser and allows you to query your html doc as a tree.

    Just my 2p :)