in reply to Improving speed match arrays with fuzzy logic

I expect you’ll be interested in these: UMLS::Similarity, UMLS::Interface, and a general search of the CPAN in the UMLS::* space. It's not trivial to set up but the instruction are good. Unfortunately there are no free medical stemming dictionaries I'm aware of. This really helps the kind of comparison task you're doing. If you know or learn of any, please do share. There are other approximate matching packages. Some like the metaphone stuff are probably too fuzzy for you and others like String::Approx are probably too slow.

You might want to index all your data first and use something like Lucy (to do stemming, tokenizing, and various normalization once instead of per search/match); or possibly resort to specialized data containers like Judy.

  • Comment on Re: Improving speed match arrays with fuzzy logic

Replies are listed 'Best First'.
Re^2: Improving speed match arrays with fuzzy logic
by Takamoto (Monk) on Jan 19, 2019 at 10:00 UTC

    Thank you, Your Mother. I did not know the modules you pointed to! And they open up new horizons to me, as I am mostly interested in natural language-related tasks. I agree, stemming is a good approach, however so much dependent on available resources (language, domain, etc.) that - as for now - its usefulness is limited to general language. At least this is my experience.