in reply to Make your 404 pages smarter with metaphone matching

I'll treat this like a slashdot pointer and make a comment before reading the article, but I hope the author was careful not to allow any "helper" techniques reveal hidden URLs or other files. mod_speling (yes, that's the way it's spelled) was notorious for that, happily handing out pointers to guessed-at URLs. Oops.
  • Comment on Re: Make your 404 pages smarter with metaphone matching

Replies are listed 'Best First'.
Re^2: Make your 404 pages smarter with metaphone matching
by Taulmarill (Deacon) on Sep 03, 2007 at 15:27 UTC
    As far as i can see, the script used filters all files by extension. Everything with .html gets indexed, everything else isn't.
      Everything with .html gets indexed, everything else isn't.
      And ... what?

      That doesn't address my concern at all. If I have a private URL that ends in ".html", it'll still likely get indexed. Then someone guesses a URL similar to that, and boom, they're in.

      A good solution would also have an additional regex or blacklist of things that should never be offered as a suggestion.

        If I have a private URL that ends in ".html", it'll still likely get indexed.

        It's not likely, it will get indexed for sure. I don't think this is meant as a finished solution but to show a general way how to do such things.
        I am afraid however, there will be more cut & pasting than actual reading.