Re: Code Efficiency

You say "new keywords can be added at any time." What about changing and deleting? If you only have keywords added, then you could optimize (for time) very efficiently by just detecting what new keywords show up and then inserting them into the sorted list. For instance, make a cache (copy) of the tree and compare the old vs. the new.

The next optimization step: Keep a timestamp (and/or hash) of every file instead of a copy. Since you are only adding keywords, any time a file changes, dump its whole contents into the sorted list.

Comment on Re: Code Efficiency

Replies are listed 'Best First'.
Re: Re: Code Efficiency by fourmi (Scribe) on Mar 26, 2004 at 11:24 UTC
Hi It's a photo archive, so hopefully previous keywords won't be incorrect (carrots shouldn't dissapear from the image!). Basically what i have done now is made a file 'UBER1' containing PicRefID: Keyword1, Keyword2 ... The keyword list is basically another file 'UBER2', identical but without the PicRefID, and sorted, and uniq'd. if a keyword is added to a picture, it adds it to the relevant PicRefId line in 'UBER1', and also if the new keyword does not already appear in 'UBER2', places it in the relevant place. It's all much much nicer now. All the data in the 100,000ish files is in one file, and the keyword list only needs occasional updating. Basically my initial plan was based on a tiny subset running well, and not expecting such massive over heads. now it only takes a minute or two, and that's more the upload of 15M of keywords, rather than script load (and then the upload!) Cheers to all that helps, much appreciated!! ant	[reply]

Replies are listed 'Best First'.

Re: Re: Code Efficiency
by fourmi (Scribe) on Mar 26, 2004 at 11:24 UTC

PicRefID: Keyword1, Keyword2 ...

[reply]