in reply to Re^3: Supervised machine learning algo for text matching across two files
in thread Supervised machine learning algo for text matching across two files

I'm with thanos1983 on this one. 'National healthcare basketball society' maps to 'HCBS_max'?!? Wow! If anyone figures out a solution to this one, please let me know; I'd sure like to go in with you on patenting/exploiting it!


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^5: Supervised machine learning algo for text matching across two files
by choroba (Cardinal) on May 24, 2017 at 21:15 UTC
    You can add a feature like "if you split the long string to words based on a dictionary and extract first letters, you'll get part of the abbreviation." Then let the algorithm decide whether it's useful or not. Similarly, you can train the algorithm on a large corpus of downloaded texts, maybe the fact that the words tend to appear in the same article could be used as a feature, too (or at least some number expressing their collocability).

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,