- Remove unimportant tokens (trailing text in parens)
You certainly have a number of good points. However, parenthesization need not necessarily imply subordination, it could also stand for an alternative name/term, as in the OP's case #5:
where the part in parentheses has a better chance of contributing to a successful match than the unparenthesized part. Just to illustrate one of the many potential issues the OP might encounter.
And while we're at it: how would a machine identify what is a name and what not - as in "Archivio Giuliano Marini" - without consulting either a database of common names, or checking against a list of all known regular words (+ inflections) in a particular language? Even Google translate apparently gets it wrong when translating "Archivio Giuliano Marini" into English (leaving "Archivio" as is, instead of translating it to "archive" — even though you tell it what source language it is), while it gets it right (interestingly) with "Archivio Marini"...
In reply to Re^2: Fuzzy text matching... again
by almut
in thread Fuzzy text matching... again
by kiz
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |