in reply to How to index dynamic pages?

You can create a dummy page with links and let your crawler to read it, for example:
http://hostname/path/filename.jsp?id=1 http://hostname/path/filename.jsp?id=2 http://hostname/path/filename.jsp?id=3 http://hostname/path/filename.jsp?id=4 ...
Or you can put your links into your config file. Look at this search engine for examples.

--dda

Replies are listed 'Best First'.
Re: Re: How to index dynamic pages?
by Anonymous Monk on Aug 09, 2002 at 16:22 UTC

    The crawler actually reads the .jsp page, grabs the keywords and url http://hostname/path/filename.jsp without "id" and inserts them in the database. When the keywords are searched, the page file.jsp without the "id" can not be displayed. How can I solve that?

    Thanks

      Many crawlers intentionally sidestep URLs that look like they're dynamic (i.e., URLs that contain ? = &). To trick crawlers like this, you need to use URLs of the form http://hostname/path/filename.jsp/N where N is an alternative for id=N.

      If you were using Perl rather than JSP, it's a simple matter to pick up the /N from $ENV{PATH_INFO} or $ENV{REQUEST_URI}.

      But this isn't JavaMonks, so you're on your own from here.

        This is not the problem. The jsp page displays itself as http://hostname/path/filename.jsp/id=N. My question is the filename.jsp is crawlered using find . -name "*.jsp", its url stored in the database as http://hostname/path/filename.jsp. When the page is searched, the filename.jsp without the id can't be displayed.

        Thanks

      >grabs the keywords and url http://hostname/path/filename.jsp without "id" and inserts them in the database

      You wrote that crawler, why then it does such a weird thing? :) Why can't it insert 'id=N' also?

      --dda