Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a direcotry called
/pen/
Now in pen I have product drecotires and in pen I have .htaccess
DirectoryIndex /pen/index.cgi
Which basically tells every directy to go to that, then i determine what direcotry im in with $ENV{'REQUEST_URI'}, but I wanted a spider to add to a ssearch engine, but will it since theres nothing in those direcotries, or should I make a index.shtml and redirect that to index.cgi so it looks like its reading a .htm file?

Replies are listed 'Best First'.
Re: Spidering CGI
by tachyon (Chancellor) on Nov 11, 2002 at 23:24 UTC

    This is a similar problem to that faced by perlmonks.org which does not have pages a such - it is all database driven. The problem can be solved by putting up real html pages to be spidered like PM does see: http://perlmonks.thepen.com/

    Search engines such as google are spidering CGI based sites but have issues, see this article

    If you are requiring .htaccess authentication to get to your pages then search spiders will not be able to access them but this hardly matters as nor would anyone following a link.

    Presumably you have set your robots files....

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Spidering CGI
by adrianh (Chancellor) on Nov 12, 2002 at 10:02 UTC

    If you're using Apache mod_rewrite is very useful for handling this sort of thing.

    You can easily change any URL under the hood without the users knowledge so they would see:

    http://www.foo.com/foo/bar/index.html

    and your code would see

    http://www.foo.com/foo/bar/index.cgi

    without the user knowing anything about index.cgi.