Re: How to index dynamic pages?

You can create a dummy page with links and let your crawler to read it, for example:

http://hostname/path/filename.jsp?id=1
http://hostname/path/filename.jsp?id=2
http://hostname/path/filename.jsp?id=3
http://hostname/path/filename.jsp?id=4
...
[download]

Or you can put your links into your config file. Look at this search engine for examples.

--dda

Comment on Re: How to index dynamic pages? Download Code

Replies are listed 'Best First'.
Re: Re: How to index dynamic pages? by Anonymous Monk on Aug 09, 2002 at 16:22 UTC
The crawler actually reads the .jsp page, grabs the keywords and url http://hostname/path/filename.jsp without "id" and inserts them in the database. When the keywords are searched, the page file.jsp without the "id" can not be displayed. How can I solve that? Thanks	[reply]
Re: Re: Re: How to index dynamic pages? by dws (Chancellor) on Aug 09, 2002 at 16:33 UTC
Many crawlers intentionally sidestep URLs that look like they're dynamic (i.e., URLs that contain ? = &). To trick crawlers like this, you need to use URLs of the form `http://hostname/path/filename.jsp/N` where N is an alternative for id=N. If you were using Perl rather than JSP, it's a simple matter to pick up the /N from `$ENV{PATH_INFO}` or `$ENV{REQUEST_URI}`. But this isn't JavaMonks, so you're on your own from here.	[reply] [d/l] [select]
Re: Re: Re: Re: How to index dynamic pages? by Anonymous Monk on Aug 09, 2002 at 17:28 UTC
This is not the problem. The jsp page displays itself as http://hostname/path/filename.jsp/id=N. My question is the filename.jsp is crawlered using find . -name "*.jsp", its url stored in the database as http://hostname/path/filename.jsp. When the page is searched, the filename.jsp without the id can't be displayed. Thanks	[reply]
Re: Re: Re: How to index dynamic pages? by dda (Friar) on Aug 12, 2002 at 07:07 UTC
>grabs the keywords and url http://hostname/path/filename.jsp without "id" and inserts them in the database You wrote that crawler, why then it does such a weird thing? :) Why can't it insert 'id=N' also? --dda	[reply]