in reply to Code Efficiency

You say "new keywords can be added at any time." What about changing and deleting? If you only have keywords added, then you could optimize (for time) very efficiently by just detecting what new keywords show up and then inserting them into the sorted list. For instance, make a cache (copy) of the tree and compare the old vs. the new.

The next optimization step: Keep a timestamp (and/or hash) of every file instead of a copy. Since you are only adding keywords, any time a file changes, dump its whole contents into the sorted list.

Replies are listed 'Best First'.
Re: Re: Code Efficiency
by fourmi (Scribe) on Mar 26, 2004 at 11:24 UTC
    Hi
    It's a photo archive, so hopefully previous keywords won't be incorrect (carrots shouldn't dissapear from the image!).
    Basically what i have done now is made a file 'UBER1' containing
    PicRefID: Keyword1, Keyword2 ...
    The keyword list is basically another file 'UBER2', identical but without the PicRefID, and sorted, and uniq'd. if a keyword is added to a picture, it adds it to the relevant PicRefId line in 'UBER1', and also if the new keyword does not already appear in 'UBER2', places it in the relevant place.

    It's all much much nicer now. All the data in the 100,000ish files is in one file, and the keyword list only needs occasional updating.

    Basically my initial plan was based on a tiny subset running well, and not expecting such massive over heads. now it only takes a minute or two, and that's more the upload of 15M of keywords, rather than script load (and then the upload!)
    Cheers to all that helps, much appreciated!!
    ant