Re^4: Supervised machine learning algo for text matching across two files

I'm with thanos1983 on this one. 'National healthcare basketball society' maps to 'HCBS_max'?!? Wow! If anyone figures out a solution to this one, please let me know; I'd sure like to go in with you on patenting/exploiting it!

Give a man a fish: <%-{-{-{-<

Comment on Re^4: Supervised machine learning algo for text matching across two files Select or Download Code

Replies are listed 'Best First'.
Re^5: Supervised machine learning algo for text matching across two files by choroba (Cardinal) on May 24, 2017 at 21:15 UTC
You can add a feature like "if you split the long string to words based on a dictionary and extract first letters, you'll get part of the abbreviation." Then let the algorithm decide whether it's useful or not. Similarly, you can train the algorithm on a large corpus of downloaded texts, maybe the fact that the words tend to appear in the same article could be used as a feature, too (or at least some number expressing their collocability). ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l]

Replies are listed 'Best First'.

Re^5: Supervised machine learning algo for text matching across two files
by choroba (Cardinal) on May 24, 2017 at 21:15 UTC

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
[download]

[reply]
[d/l]