Re: Perl Search Applicance

ALSO:

I would like to be able to parse PDF files, and process the text, anyone have some working examaples for pdf->text conversions?

Comment on Re: Perl Search Applicance

Replies are listed 'Best First'.
Re: Re: Perl Search Applicance by domm (Chaplain) on Jun 20, 2002 at 19:59 UTC
I'm using pdf2text to convert PDFs to plain text before indexing, but it's not that fast (but then, I only index some 200 PDFs) `-- #!/usr/bin/perl for(ref bless[],just'another'perl'hacker){s-:+-$"-g&&print$_.$/}` [download]	[reply] [d/l]
Re: Re: Perl Search Applicance by Bluepixel (Beadle) on Jun 20, 2002 at 19:26 UTC
As mattr has pointed out, have a look at namazu.org . Their crawler seems also to index pdf pages. I would recommend you, to try out or read the code of the other search engines mattr provided in his post, before starting writing your own one. You will get a lot of usefull ideas from them. As for the database, I currently use mysql (unfortunately.., it's slow). I give each word a unique id, and then split the words found in the documents over several tables, so the tables won't get too large.	[reply]