Re^6: crawling one website

in reply to Re^5: crawling one website
in thread crawling one website

Do you want to say that senopt.com does not have real links?
...
Is it possible to get all real links residing in a site on all depth levels with this program?

How do you define "real links" ?

What did your reading of More robust link finding than HTML::LinkExtor/HTML::Parser? suggest?

HTH,

planetscape

Comment on Re^6: crawling one website

Replies are listed 'Best First'.
Re^7: crawling one website by vit (Friar) on May 29, 2011 at 02:36 UTC
By real links I mean full kinks started with http://... not links to sub-directories. The program you recommended seems to be what I need. It looks like it retrieves all "real" links from a webpage, but it does not go over a domain tree. So, in order to get all links starting from the root I may use some program (say WWW::Sitemap) which retrieves urls of all depth levels and inside each one I can use hgrepurl.pl to get all links from there. Am I right?	[reply]
Re^8: crawling one website by planetscape (Chancellor) on May 29, 2011 at 03:10 UTC
T.I.T.S. Or, Try It To See. HTH, planetscape	[reply]

In Section Seekers of Perl Wisdom