olowodara has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I need to build an internet search engine can anyone with a deep insight about how to go about the project help me. A google and baidu like in nature. Thanks

Replies are listed 'Best First'.
Re: internet search engine
by marto (Cardinal) on Mar 15, 2011 at 10:47 UTC

    I think you need to do your own research into the issues involved with rolling your own search engine. Actually using a search engine to find out more about this topic would be a good starting point. If you are having problems with implementing something in Perl, be sure to read How do I post a question effectively? and let us know what the problems are.

Re: internet search engine
by deorth (Scribe) on Mar 15, 2011 at 10:48 UTC
    My advice would be to spend more time scoping your project, identify your stakeholders, then work with them to produce a very good requirements document. The one line you give here is the barest start of a hint of a possibility... If you have a specific perl questions, we're always here :)
Re: internet search engine
by Anonymous Monk on Mar 15, 2011 at 11:00 UTC
Re: internet search engine
by pemungkah (Priest) on Mar 15, 2011 at 17:44 UTC
    I'll give you some general comments - I'm speaking in the most vague generalities because I don't want to overshare any technical details, but I worked at Blekko, a search engine that was built ground-up with Perl.

    First, you will need a fast, high-capacity datastore. Probably not an SQL database; you'll want something very fast, very expandable, and very dependable, with a lot of internal redundancy. Look into the NoSQL key-value datastores, or you may decide you need to write one. Blekko wrote one.

    Second, you will want fast indexing and a way to quickly run queries across the whole of your database; something like Hadoop or BigTable. Blekko wrote one.

    I'm leaving out the vast majority of details here because they're proprietary information, but the summary is that you'll need a big (hundreds of machines), fast datastore to store your crawl and index, and a good mechanism to access it quickly. Blekko wrote all these.

    It's taken Blekko 4 years to get to where they are now (with is the "pretty darn good, better than Google some places, not as good others" with about 20 people (though they started with about 7 or 8). You're in for a long-haul process, and your backers will need to be patient. Writing a search engine is not easy, and will go better if you have folks who have already worked on one for a while onboard.

    Crawlers are easy; search engines are hard.