Soundex is so trivial of an algorithm it isn't too difficult to create a reverse lookup.

I know, I tried it, but the results are pretty useless. Most of these "matches" are nothing like the given words:

The problems with soundex include:

  1. it only "matches" words that begin with the same letter.

    But for example, 'Cray' is a far better sound-alike for 'Gray' than most of those above.

    And there are many words or phrases that might match 'Citrullus' that begin with 'S'. Say 'Sit with us'.

  2. it discards all vowels and 'h's.

    Hence matches like 'Gray; with 'giaour'.

  3. it only considers 4 significant consonents.

    Hence matches like 'Charleston' with 'carls' & 'creolizations'

The name Soundex is deceptive. It has little or nothing to do with the sound.

Metaphone is too specific. Many of the words in the OPs examples would never match anything if encoded at their full length, and if you reduce the encoding length across the board, you get far too many hits for other words. And to dynamically adjust the length of the encoding successfully, you need to encode your dictionary words at all lengths.

I am interesting in your idea (even without implementation).

The problem with developing my idea is that it would be a table-driven algorithm that would require considerable effort (programming & manual), in order to derive the tables. Not worth the effort unless the was at least an outside chance someone might make use of it.

Hence I'd like to know if the OP is serious. And, what he (or other people) might use it for. That might give me an idea as to whether it is worth the time and effort.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"

In reply to Re^3: Generate strings which sounds like source string by BrowserUk
in thread Generate strings which sounds like source string by Gangabass

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.