Re: Re: How to index dynamic pages?

The crawler actually reads the .jsp page, grabs the keywords and url http://hostname/path/filename.jsp without "id" and inserts them in the database. When the keywords are searched, the page file.jsp without the "id" can not be displayed. How can I solve that?

Thanks

Comment on Re: Re: How to index dynamic pages?

Replies are listed 'Best First'.
Re: Re: Re: How to index dynamic pages? by dws (Chancellor) on Aug 09, 2002 at 16:33 UTC
Many crawlers intentionally sidestep URLs that look like they're dynamic (i.e., URLs that contain ? = &). To trick crawlers like this, you need to use URLs of the form `http://hostname/path/filename.jsp/N` where N is an alternative for id=N. If you were using Perl rather than JSP, it's a simple matter to pick up the /N from `$ENV{PATH_INFO}` or `$ENV{REQUEST_URI}`. But this isn't JavaMonks, so you're on your own from here.	[reply] [d/l] [select]
Re: Re: Re: Re: How to index dynamic pages? by Anonymous Monk on Aug 09, 2002 at 17:28 UTC
This is not the problem. The jsp page displays itself as http://hostname/path/filename.jsp/id=N. My question is the filename.jsp is crawlered using find . -name "*.jsp", its url stored in the database as http://hostname/path/filename.jsp. When the page is searched, the filename.jsp without the id can't be displayed. Thanks	[reply]
Re: Re: Re: How to index dynamic pages? by dda (Friar) on Aug 12, 2002 at 07:07 UTC
>grabs the keywords and url http://hostname/path/filename.jsp without "id" and inserts them in the database You wrote that crawler, why then it does such a weird thing? :) Why can't it insert 'id=N' also? --dda	[reply]