Hey Jenda,
Actually yes - here is a list of features the module will have in V 0.01:
- Given the HTML of a page
- Find all anchor elements - broken into "this domain links" and "other domain links".
- Find images on a page - broken as above.
- Find the Title, description and other such meta data.
- Find meta keywords and description of the page.
- Extract lists (ul and ol) from the HTML of a page
- Find RSS Feeds of a page, if any.
- anything else I / You guys can think of ...
- Split up an anchor tag into : The URL, the alt text and the anchor text.
- Given possible anchor/alt text find the related link. [Given Home - <a href=""> home page </a> will be extracted.
- Given a potentially relative URL and the current URL, returns the absolute URL.
- Given a potential redirecting URL, returns the final destination URL.
- Breaks up a URL into Protocol, domain and URI
I am still looking for additional features I can add. So please do suggest anything else you can think of