I think you need to do your own research into the issues involved with rolling your own search engine. Actually using a search engine to find out more about this topic would be a good starting point. If you are having problems with implementing something in Perl, be sure to read How do I post a question effectively? and let us know what the problems are.
| [reply] |
My advice would be to spend more time scoping your project, identify your stakeholders, then work with them to produce a very good requirements document.
The one line you give here is the barest start of a hint of a possibility...
If you have a specific perl questions, we're always here :) | [reply] |
| [reply] |
I'll give you some general comments - I'm speaking in the most vague generalities because I don't want to overshare any technical details, but I worked at Blekko, a search engine that was built ground-up with Perl.
First, you will need a fast, high-capacity datastore. Probably not an SQL database; you'll want something very fast, very expandable, and very dependable, with a lot of internal redundancy. Look into the NoSQL key-value datastores, or you may decide you need to write one. Blekko wrote one.
Second, you will want fast indexing and a way to quickly run queries across the whole of your database; something like Hadoop or BigTable. Blekko wrote one.
I'm leaving out the vast majority of details here because they're proprietary information, but the summary is that you'll need a big (hundreds of machines), fast datastore to store your crawl and index, and a good mechanism to access it quickly. Blekko wrote all these.
It's taken Blekko 4 years to get to where they are now (with is the "pretty darn good, better than Google some places, not as good others" with about 20 people (though they started with about 7 or 8). You're in for a long-haul process, and your backers will need to be patient. Writing a search engine is not easy, and will go better if you have folks who have already worked on one for a while onboard.
Crawlers are easy; search engines are hard. | [reply] |