in reply to Re: Generate strings which sounds like source string
in thread Generate strings which sounds like source string

BrowserUk,
Actually, Soundex is so trivial of an algorithm it isn't too difficult to create a reverse lookup. On the other hand, it seems there would be no need. Just perform a forward encoding of all words in your dictionary and store the result in a database for future lookups.

The real problem is that all of these algorithms, to include the double metaphone, only encode the first n consonants (4 in the case of double metaphone unless the first character is a vowel). I am interesting in your idea (even without implementation).

Cheers - L~R

  • Comment on Re^2: Generate strings which sounds like source string

Replies are listed 'Best First'.
Re^3: Generate strings which sounds like source string
by BrowserUk (Patriarch) on Feb 22, 2010 at 21:41 UTC
    Soundex is so trivial of an algorithm it isn't too difficult to create a reverse lookup.

    I know, I tried it, but the results are pretty useless. Most of these "matches" are nothing like the given words:

    The problems with soundex include:

    1. it only "matches" words that begin with the same letter.

      But for example, 'Cray' is a far better sound-alike for 'Gray' than most of those above.

      And there are many words or phrases that might match 'Citrullus' that begin with 'S'. Say 'Sit with us'.

    2. it discards all vowels and 'h's.

      Hence matches like 'Gray; with 'giaour'.

    3. it only considers 4 significant consonents.

      Hence matches like 'Charleston' with 'carls' & 'creolizations'

    The name Soundex is deceptive. It has little or nothing to do with the sound.

    Metaphone is too specific. Many of the words in the OPs examples would never match anything if encoded at their full length, and if you reduce the encoding length across the board, you get far too many hits for other words. And to dynamically adjust the length of the encoding successfully, you need to encode your dictionary words at all lengths.

    I am interesting in your idea (even without implementation).

    The problem with developing my idea is that it would be a table-driven algorithm that would require considerable effort (programming & manual), in order to derive the tables. Not worth the effort unless the was at least an outside chance someone might make use of it.

    Hence I'd like to know if the OP is serious. And, what he (or other people) might use it for. That might give me an idea as to whether it is worth the time and effort.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.