in reply to Re: Complicated searches in a very large text file.
in thread Complicated searches in a very large text file.

Ack, after reading through the MySQL docs it seems as though the full text searching does not meet my requirements =(. search string such as "this really blows !stupid fun" and get back matches for:
will return: "/data2/studio/projects/clientname/print/this really/funermaker/santa +blows/thefile.tif" exists on Archive-dvd-000002122 but not this: "/data2/studio/projects/clientname/print/this really/funermaker/santa +blows stupid/thefile.tif" because it has stupid in the fq file name.
You see I need to be able to substring search quickly and it looks to only support full word matches, or using the * boolean full-text search capability operator I can do for example fun* to match "funny" or "fungle" but i do not see a way to match "Senior Howfun" -- do you have any insight?

I know I can fall back to using LIKE statments to do basicly the same thing i am doing in the original post, but it seems like that would just add the overhead of a DB to do the same thing -- any ideas?

-Waswas

Replies are listed 'Best First'.
Re: Re: Re: Complicated searches in a very large text file.
by ant9000 (Monk) on Jun 30, 2003 at 17:42 UTC
    Tons of people already pointed out Swish-E: that's the way I'd go myself, considering it already does all the searches you are looking for (and more) and that you can access its index files directly from Perl, via its Perl interface.
      From the Swish docs,
      The wildcard (*) is available, however it can only be used at the end +of a word: otherwise is is considerd a normal character (i.e. can be +searched for if included in the WordCharacters directive). swish-e -w "librarian" -f myIndex this query only retrieves files which contain the given word. On the other hand: swish-e -w "librarian*" -f myIndex retrieves ``librarians'', ``librarianship'', etc. along with ``librari +an''.
      So how would you go about having a search string such as "test and apple" be able to match on a document conatining "an apple is good to eat when they come from the grabaltester reigon of madeupcountry."

      -Waswas