Re: HTML::Parser??

No need to wonder anymore, yes, HTML::Parser will help you accomplish what you're doing.

DESCRIPTION
    Objects of the "HTML::Parser" class will recognize markup and separate
    it from plain text (alias data content) in HTML documents. As different
    kinds of markup and text are recognized, the corresponding event
    handlers are invoked.

    "HTML::Parser" in not a generic SGML parser. We have tried to make it
    able to deal with the HTML that is actually "out there", and it normally
    parses as closely as possible to the way the popular web browsers do it
    instead of strictly following one of the many HTML specifications from
    W3C. Where there is disagreement there is often an option that you can
    enable to get the official behaviour.

    The document to be parsed may be supplied in arbitrary chunks. This
    makes on-the-fly parsing as documents are received from the network
    possible.

    If event driven parsing does not feel right for your application, you
    might want to use "HTML::PullParser". It is a "HTML::Parser" subclass
    that allows a more conventional program structure.

If you have no idea how I got that description, please read this friendly guide on perl documentation and resources.

There is a better way, and it's called HTML::TokeParser (see Tutorials for a tutorial).

____________________________________________________
** The Third rule of perl club is a statement of fact: pod is sexy.

Comment on Re: HTML::Parser??

Replies are listed 'Best First'.
Re: Re: HTML::Parser?? by bleekbob (Initiate) on Aug 20, 2002 at 07:08 UTC
Ok, great.. any chance you would like to initialize me on the use of the HTML::TokeParser?	[reply]
Re: Re: Re: HTML::Parser?? by bleekbob (Initiate) on Aug 22, 2002 at 17:31 UTC
oh yeah, the tutorial.. thanks hommie	[reply]