in reply to Re: Fuzzy text matching... again
in thread Fuzzy text matching... again
- Remove unimportant tokens (trailing text in parens)
You certainly have a number of good points. However, parenthesization need not necessarily imply subordination, it could also stand for an alternative name/term, as in the OP's case #5:
where the part in parentheses has a better chance of contributing to a successful match than the unparenthesized part. Just to illustrate one of the many potential issues the OP might encounter.
And while we're at it: how would a machine identify what is a name and what not - as in "Archivio Giuliano Marini" - without consulting either a database of common names, or checking against a list of all known regular words (+ inflections) in a particular language? Even Google translate apparently gets it wrong when translating "Archivio Giuliano Marini" into English (leaving "Archivio" as is, instead of translating it to "archive" — even though you tell it what source language it is), while it gets it right (interestingly) with "Archivio Marini"...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Fuzzy text matching... again
by Limbic~Region (Chancellor) on Jan 07, 2010 at 16:27 UTC | |
by almut (Canon) on Jan 07, 2010 at 18:03 UTC |