Re^5: crawling one website

Do you want to say that senopt.com does not have real links?

I tried

perl hgrepurl.pl http://www.txtlinks.com/
[download]

which is quite encouraging.
txtlinks is a web directory and it returns first a portion of url encoded stuff, like:

...http://www.txtlinks.com/%3Chtml%3E%0A%3Chead%3E%0A%20%20%3Ctitle%3E
+TXT%20Links%20Pure%20Links%20Di.......
[download]

without line brakes. Then it returns a bunch of normal links from this site.
What is the first portion?
Is it possible to get all real links residing in a site on all depth levels with this program?

Comment on Re^5: crawling one website Select or Download Code

Replies are listed 'Best First'.
Re^6: crawling one website by planetscape (Chancellor) on May 29, 2011 at 02:07 UTC
Do you want to say that senopt.com does not have real links? ... Is it possible to get all real links residing in a site on all depth levels with this program? How do you define "real links" ? What did your reading of More robust link finding than HTML::LinkExtor/HTML::Parser? suggest? HTH, planetscape	[reply]
Re^7: crawling one website by vit (Friar) on May 29, 2011 at 02:36 UTC
By real links I mean full kinks started with http://... not links to sub-directories. The program you recommended seems to be what I need. It looks like it retrieves all "real" links from a webpage, but it does not go over a domain tree. So, in order to get all links starting from the root I may use some program (say WWW::Sitemap) which retrieves urls of all depth levels and inside each one I can use hgrepurl.pl to get all links from there. Am I right?	[reply]
Re^8: crawling one website by planetscape (Chancellor) on May 29, 2011 at 03:10 UTC
T.I.T.S. Or, Try It To See. HTH, planetscape	[reply]