Ace128 has asked for the wisdom of the Perl Monks concerning the following question:

Hey smart monks!

I'm sitting here and thinking about what way to go about indexing text files for fast searching. MySQL apparently has full text search abilities, but then I have to add the whole text into the database. Feels like a waste (even if it may compress the data in the table). With KinoSearch it seems that it creates 2 binary files. I was thinking using KinoSearch to create these files, and then insert them into some table, instead of inserting the text itself. Then just extract this BLOB and use that with KinoSearch. KinoSearch feels faster this way too! Ideas on this?

The thing is. I want it to be easy (and fast) to update this index I have (KindoSearch index). But I'm not sure how good KinoSearch is when it comes to update the index. That is, remove a particular file's index from the big KinoSearch index file. I want speed here! :) And the text-files (and thus the index) may be updated often. How good is it with small changes in a text file? Needs to rebuild the whole thing anyway?

Also, I'm not sure if SQLite even has support for fulltext searching... So, KinoSearch is a good middleman here I think... :)

Update: Forgot to mention that the we are talking about many Mb here. For text files all together. Index smaller ofcource, but can be some 100 Mbs surely... :)

What do you guys think?
/ Ace

Replies are listed 'Best First'.
Re: Full Text Searching!
by perrin (Chancellor) on Jul 11, 2006 at 03:10 UTC
    Swish-e is a great search engine with active Perl support. Very fast and easy to use, and updates the index very quickly.
      Seems nice... although I'm little against running some external application like that from Perl. Prefer more integration. :) But gonna check it out!
        There is no external application. It's just a C library that you call via a Perl module, just as you would with SQLite or BerkeleyDB.
Re: Full Text Searching!
by eric256 (Parson) on Jul 11, 2006 at 06:10 UTC

    I've used and enjoyed MySQL's full text search in the past. It might be worth at least looking into before you try to shoe horn on a different system. BTW database are normaly made to handle many 100's of MB.


    ___________
    Eric Hodges
      Yea, but I get duplicate data. Both in database and on the harddrive.

        I thought from your post you already had the data in the database but didn't want to create the index. If not, then yea adding it to the database in addition to the hard drive might be a bit overkill and hard to keep in sync.


        ___________
        Eric Hodges
Re: Full Text Searching!
by derby (Abbot) on Jul 11, 2006 at 13:06 UTC

    Deleting docs in KinoSearch should be pretty fast (it just marks a document as deleted and only deletes it for real when a segment is re-written). What would storing the index in the database buy you?

    Marvin Humphrey posts here as creamygoodness and recently posted about his next steps for KinoSearch.

    -derby
      Well, I have other data in the database. But I was thinking using KinoSearch for full text searching. And, saving the kinoindex in the db. That way I take advantage of the already used db, and don't need to use the hd (yea, db use harddrive aswell but you get what I mean). This is specially good for SQLite since it seems that SQLite doesn't have support for this yet.