in reply to Fuzzy text matching... again
There are plenty of tools to build your own but you have to figure out how to glue them together. This is not that simple, so neither can your approach. I used a layered approach. Let me give you some things to consider.
Now consider all the tools in your tool bag and how they may be useful. Here are some examples:
I can see you have already searched CPAN and know about things like Text::Compare and Text::PhraseDistance but these seem to be publication references. There are a number of modules on CPAN for citations and bibliography references - you may be able to leverage them as well. It would also be helpful to know more about the overall project because there are some other tools that may be helpful. For instance, do you have a known list of publications and have a list that needs to be identified or do you have one huge bunch and are trying to identify duplicates? The approach I would take is different in both case.
I have a stack full of notes on the topic of text comparison and analysis I have been meaning to write about at length. If you need more help, speak up.
Cheers - L~R
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Fuzzy text matching... again
by almut (Canon) on Jan 07, 2010 at 16:08 UTC | |
by Limbic~Region (Chancellor) on Jan 07, 2010 at 16:27 UTC | |
by almut (Canon) on Jan 07, 2010 at 18:03 UTC |