Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Improving speed match arrays with fuzzy logic

by Your Mother (Archbishop)
on Jan 18, 2019 at 18:45 UTC ( [id://1228729]=note: print w/replies, xml ) Need Help??


in reply to Improving speed match arrays with fuzzy logic

I expect you’ll be interested in these: UMLS::Similarity, UMLS::Interface, and a general search of the CPAN in the UMLS::* space. It's not trivial to set up but the instruction are good. Unfortunately there are no free medical stemming dictionaries I'm aware of. This really helps the kind of comparison task you're doing. If you know or learn of any, please do share. There are other approximate matching packages. Some like the metaphone stuff are probably too fuzzy for you and others like String::Approx are probably too slow.

You might want to index all your data first and use something like Lucy (to do stemming, tokenizing, and various normalization once instead of per search/match); or possibly resort to specialized data containers like Judy.

  • Comment on Re: Improving speed match arrays with fuzzy logic

Replies are listed 'Best First'.
Re^2: Improving speed match arrays with fuzzy logic
by Takamoto (Monk) on Jan 19, 2019 at 10:00 UTC

    Thank you, Your Mother. I did not know the modules you pointed to! And they open up new horizons to me, as I am mostly interested in natural language-related tasks. I agree, stemming is a good approach, however so much dependent on available resources (language, domain, etc.) that - as for now - its usefulness is limited to general language. At least this is my experience.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1228729]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-04-24 04:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found