Hello wise monks. I know this kind of question has been asked somewhat, and to that many answers have come out, and the use of HTML::TokeParser is a wonderful tool for most of my needs.
I have some more complicated things that I want to do with pages I have consumed and parsed with said module. I would like to use xpath to work with the HTML page that I have consumed, as I do with the XML reports I create/read. The problem I am having is two fold. First, XML::LibXML is ok if there are no ampersands and that the web page is strict. As we all know this is not typical and almost all pages that I come across on client sites are broken to a small degree or use an ampersand.
I have seen a few modules that allow you to access Mozilla/Webkit engines, but seem that they will actually launch a browser instance from what I have read, and this isn't what I want. Ideally I'd like to have the ability to consume a web page into a DOM object with a mainstream parser (Mozilla/Webkit etc) and then be able to extract nodes and objects via xpath, again, as I do with LibXML and XML files.
Thanks in advance for any insight / suggestions
In reply to Parsing HTML Documents by itsscott
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |