ropey has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks... maybe not a truly perl perl question... but the solution will be in Perl so I dont feel so Guilty...

So take this simple example, I have a table in my database which is full of city names which mate to IATA codes.. for instance

London = LON Miami = MIA Paris = PAR

So we have a user comes in and enters London... great so we do a select where city name = 'London'.... but then the user cunningly enters London, UK... which of course my original select wont work.... nor would any like statements..

So... maybe I use Soundex... (mysql Soundex in this case)... so I use said function... and so I check this and I get..

select soundex('London'), soundex('London, UK')

Which gives me L535 and L5352... so getting better...maybe do some comparison of first X bits ??

Then I look at user entering UK - London... Soundex not going to like that coming back with U24535...

Now I could take to the next point of no matches... so I split it... but on what ? use could enter spaces, - ',' etc etc...

Now dont get me wrong I can deal with the majority of the cases using likes, soundexs.... but maybe there is a better way ??

Replies are listed 'Best First'.
Re: Matching in the Database...
by halley (Prior) on May 10, 2007 at 17:38 UTC
    I would just expand your table of city->iata examples.
    my %IATA = { }; $IATA{$_} = 'LON' for qw/lon london londonuk londonengland/; ... my $city = 'London, UK'; my $city =~ s/\W//g; my $code = $iata{lc($city)}; print $code, $/;
    You could subsequently do a Levenshtein distance from each key, if it's not found with this scheme, to try to guess one of the most likely choices. This is the kind of thing Google does when it doesn't find many hits after you search for "Londun UK".

    In general, soundex is deprecated. It's latin-alphabet-only, english-pronunciation-only, and it is not good with long words.

    --
    [ e d @ h a l l e y . c c ]

Re: Matching in the Database...
by chrism01 (Friar) on May 11, 2007 at 00:36 UTC
    Personally I never allow the user to enter free-form data if I can avoid it.
    In your case, I always generate a drop down list from the DB table (sorted alphabetically) and make them select from that.
    By-passes the whole problem. ;-)

    Cheers
    Chris

Re: Matching in the Database...
by friedo (Prior) on May 10, 2007 at 17:41 UTC
    Why don't you just tell the user to enter the city without the country, or better yet provide separate city and country fields?
Re: Matching in the Database...
by RL (Monk) on May 10, 2007 at 17:45 UTC

    Don't know if my idea fits your data. Maybe it helps anyway after tweaking.

    $userinput =~ s/[^a-z]+/,/gi; # replace anything but a-z by comma my $q = "SELECT * FROM <table> WHERE <column> IN ($userinput)";