Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
I am new to perl. I am doing a project in school, where I'm creating a simple search engine.I give a query(a string of words) and a list of files are searched and the best matched file name is displayed. The basic plan is:-
1)preprocess the files
2)Document clustering
3)create term document matrix
4)Search
I was able to write the pre-processing and clustering modules, but I have confusion regarding the term-document matrix. Should I create a separate array for each term, or should I use a 2-d array. And how do i search for terms from the array.(the document that contains maximum of the query terms is displayed)
And is there any better way to search than using a term-document matrix?
p.s. This is a pretty small project, so I don't need highly efficient search techniques, any easy ones would do.
Thank you
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Term document matrix for search engine
by SuicideJunkie (Vicar) on Nov 09, 2012 at 15:14 UTC | |
by Anonymous Monk on Nov 09, 2012 at 16:24 UTC | |
by SuicideJunkie (Vicar) on Nov 09, 2012 at 17:35 UTC |