in reply to Re: Supervised machine learning algo for text matching across two files
in thread Supervised machine learning algo for text matching across two files
There are definitely key words that can match but they are not possible to match with regex. A random example i made up would be: file 1: HCBS_max, file 2: National healthcare basketball society . In this case there is an acronym and intuitively I can google both and then decide that ok these are the same let me do the match manually. I could make a regex rule that would search for acronyms sure, but there is no obvious patterning like this... its all human entered data and thus all over the place with no standardization.
What I am thinking is that I can use the other 50 columns in the file 1 and search for patterns and associations that are not intuitive but none the less help me to classify some of the matches. Is this what a random forest can do potentially, utilizing the 15% of "ground truth" data I have as a training set?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Supervised machine learning algo for text matching across two files
by thanos1983 (Parson) on May 24, 2017 at 20:37 UTC | |
by AnomalousMonk (Archbishop) on May 24, 2017 at 21:08 UTC | |
by choroba (Cardinal) on May 24, 2017 at 21:15 UTC | |
by Anonymous Monk on May 24, 2017 at 20:46 UTC | |
|
Re^3: Supervised machine learning algo for text matching across two files
by KurtZ (Friar) on May 24, 2017 at 23:40 UTC |