http://qs1969.pair.com?node_id=11151113

Bod has asked for the wisdom of the Perl Monks concerning the following question:

Do you know of an existing module or a method of searching for similar words and phrases, especially homophones (words that are spelt differently but pronounced the same)?

I am planning a tool to analyse blocks of text, for example the 'About' page of a website, and work out the ratio of words like "I", "we", "our" to words like "you" and "your". However, I want the user to be able to enter their company's name and have that included with the first person terms. But variations of the name may exist in the text.
For example: "Google" could be "Google", "Google Inc", "Google LLC" or "Alphabet" (the last one is not really catchable programmatically)

Plus, typos exist especially around homophones and I want to be able to catch those in a similar way to search engines say "did you mean"
For example: "Perl is grate for programmes" should be suggested as "Perl is great for programs".

For the current use case, only the first example needs to be solved but I'd be interested how you would approach both. Does a long list of homophones need to be referenced? Or perhaps there is already a module on CPAN that deals with this. I have searched but nothing obvious came up.

  • Comment on Searching for homophones and words that are similar

Replies are listed 'Best First'.
Re: Searching for homophones and words that are similar
by swl (Parson) on Mar 22, 2023 at 00:50 UTC

      That search won't show Text::Metaphone. IMHO a much better (or more usable in Dutch) module than Text::SoundEx

      YMMV


      Enjoy, Have FUN! H.Merijn

      Soundex is a really bad algorithm except for mechanically (i.e. using sticks and cardboard) comparing short(!) names(!) commonly in use in the USA. See Re^2: Search for similar strings - to standardise for details.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Maybe Text::Soundex is useful

      That looks very useful and the sort of thing I was searching for. The Synopsis looks promising.

      Thank you

Re: Searching for homophones and words that are similar
by GrandFather (Saint) on Mar 22, 2023 at 02:26 UTC

      Those look very interesting - some bedtime reading :)

      Many thanks

Re: Searching for homophones and words that are similar
by LanX (Saint) on Mar 22, 2023 at 00:34 UTC
    I would parse a dictionary like wiktionary.org for phonetic writing and listed homophones.

    For instance for here you'll find

    which lists hear and hir as homophones.

    It also gives the phonetics as /hɪə̯(ɹ)/, /hɪː(ɹ)/.

    Based on that data you can try to find the closest approximation to a typo with Levenshtein distance.

    But be warned that full accuracy is nearly impossible, it's English after all.

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

      But be warned that full accuracy is nearly impossible, it's English after all

      Oh yes!
      After the discussion on how to Split first and last names and my subsequent implementation of names, I have no expectation of getting full accuracy.

        I think the biggest assault on English comes from those who don't get it from within, for example, consider the Speaker pro tempore of the US House, Marjorie Taylor Green, here talking about a peachtree dish. (She was reaching for petri dish.)