w3b has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's minimum standard of quality and will not be displayed.

Replies are listed 'Best First'.
Re: Own robot to search engine
by marto (Cardinal) on Sep 25, 2006 at 11:52 UTC
    In addition to what robot_tourist has already advised, you could have a read at Designing a Search Engine, which raises a couple of things you may want to think about. You seem to have an idea about how you want this to work. With this in mind read Re^2: WWW::Mechanize problem, here someone has a list of URLs they wish to visit/parse. Limbic~Region provides a detailed description of the how to deal with these situations, which may be of interest to you also.

    Update: used Super Search and added a link to a previous question related to this topic.

    Hope this helps

    Martin
Re: Own robot to search engine
by robot_tourist (Hermit) on Sep 25, 2006 at 11:07 UTC

    What do you need more help with? The perl or the design?

    I'm guessing you could use LWP let your script access the web, but I have no experience with it. For the design side, I'm just throwing an untested idea into the air: give your script a starting address, collect all the links on that page and go to each one link turn; and for each of those links, follow all the links on the page and so on. There may be robot etiquette you should be aware of, see this page (near the top of a google search for 'search robot etiquette'), which has lots of other search robot related information.

    How can you feel when you're made of steel? I am made of steel. I am the Robot Tourist.
    Robot Tourist, by Ten Benson

Re: Own robot to search engine
by kwaping (Priest) on Sep 25, 2006 at 15:29 UTC
    This book might also prove useful to you.

    ---
    It's all fine and dandy until someone has to look at the code.
Re: Own robot to search engine
by ysth (Canon) on Sep 25, 2006 at 15:38 UTC
    Parsing text wouldn't be the way to go; you can use google services to get search results and format them to your liking (subject to google's terms and conditions). Writing robots to crawl the entire web and a search engine to search the results is an enormously large project. It would really surprise me if it weren't better to use an existing search engine.

    For those that haven't seen it, the Binary Search Tree 2 and the report on it are interesting reading.

Re: Own robot to search engine
by spatterson (Pilgrim) on Sep 25, 2006 at 15:11 UTC
    You may want to check DBIx::TextSearch ... now htf did I write it :)

    just another cpan module author