in reply to Re: How to index dynamic pages?
in thread How to index dynamic pages?

The crawler actually reads the .jsp page, grabs the keywords and url http://hostname/path/filename.jsp without "id" and inserts them in the database. When the keywords are searched, the page file.jsp without the "id" can not be displayed. How can I solve that?

Thanks

Replies are listed 'Best First'.
Re: Re: Re: How to index dynamic pages?
by dws (Chancellor) on Aug 09, 2002 at 16:33 UTC
    Many crawlers intentionally sidestep URLs that look like they're dynamic (i.e., URLs that contain ? = &). To trick crawlers like this, you need to use URLs of the form http://hostname/path/filename.jsp/N where N is an alternative for id=N.

    If you were using Perl rather than JSP, it's a simple matter to pick up the /N from $ENV{PATH_INFO} or $ENV{REQUEST_URI}.

    But this isn't JavaMonks, so you're on your own from here.

      This is not the problem. The jsp page displays itself as http://hostname/path/filename.jsp/id=N. My question is the filename.jsp is crawlered using find . -name "*.jsp", its url stored in the database as http://hostname/path/filename.jsp. When the page is searched, the filename.jsp without the id can't be displayed.

      Thanks

Re: Re: Re: How to index dynamic pages?
by dda (Friar) on Aug 12, 2002 at 07:07 UTC
    >grabs the keywords and url http://hostname/path/filename.jsp without "id" and inserts them in the database

    You wrote that crawler, why then it does such a weird thing? :) Why can't it insert 'id=N' also?

    --dda