Another Perl article has been published at IBM developerWorks. Make your 404 pages smarter with metaphone matching:

"Create your own 404 error-message handler to provide useful links and redirects for the contents of your site. Use metaphone matching and a simple weighted score file to make typographical, spelling, and bad-link redirect suggestions. Customize the suggestions based solely on your Web site's content and preferred redirection locations. Catch multiple errors in incoming URL requests and process them for corrections in directory, script, and HTML page names."

As usual with the IBM developerWorks articles, there is feedback form at the bottom.

Martin
  • Comment on Make your 404 pages smarter with metaphone matching

Replies are listed 'Best First'.
Re: Make your 404 pages smarter with metaphone matching
by merlyn (Sage) on Sep 03, 2007 at 14:15 UTC
    I'll treat this like a slashdot pointer and make a comment before reading the article, but I hope the author was careful not to allow any "helper" techniques reveal hidden URLs or other files. mod_speling (yes, that's the way it's spelled) was notorious for that, happily handing out pointers to guessed-at URLs. Oops.
      As far as i can see, the script used filters all files by extension. Everything with .html gets indexed, everything else isn't.
        Everything with .html gets indexed, everything else isn't.
        And ... what?

        That doesn't address my concern at all. If I have a private URL that ends in ".html", it'll still likely get indexed. Then someone guesses a URL similar to that, and boom, they're in.

        A good solution would also have an additional regex or blacklist of things that should never be offered as a suggestion.