in reply to Re^5: crawling one website
in thread crawling one website

Do you want to say that senopt.com does not have real links?
...
Is it possible to get all real links residing in a site on all depth levels with this program?

How do you define "real links" ?

What did your reading of More robust link finding than HTML::LinkExtor/HTML::Parser? suggest?

HTH,

planetscape

Replies are listed 'Best First'.
Re^7: crawling one website
by vit (Friar) on May 29, 2011 at 02:36 UTC
    By real links I mean full kinks started with http://... not links to sub-directories.
    The program you recommended seems to be what I need. It looks like it retrieves all "real" links from a webpage, but it does not go over a domain tree. So, in order to get all links starting from the root I may use some program (say WWW::Sitemap) which retrieves urls of all depth levels and inside each one I can use hgrepurl.pl to get all links from there.
    Am I right?