You're going into an interesting area, information management,
which will be interesting for the next 10 years in my opinion.
I do my admin stuff for a small business (plug:
navicon.de (german, not really useful except for the
demo-download of Cernato)), who work in this field.
To achieve meaningful results, you need two things :
- A fine categorization to find the differences between images
- A thesaurus to make searching and cross-linking more efficient
Number 1, the fine categorization, can be helped by having your
program ask you for the differences between two images
every time there is a detected identity of more than (say)
75%. Either you claim that they are identical (and thus remove
the image of lower preference) or you introduce a new criterion
(or specify an existing criterion) to distinguish them.
Number 2, the thesaurus (a dictionary of synonyms, that is,
of different words that mean the same thing), is used to
create coarser granularities from the fine grained
specifications. This helps to find, for example, images with
a book on them, even though you only filed the image
under "lexicon" (bad example, but I can't think of a good
one right now.
Of course, the coolest thing would be a file system based
on this concept - taking the need of paths and the ilk away from
the user, making it possible to concentrate on the
file content rather than the semantics and syntactic administrivia
of file management...
|