Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Miniature search engine

by kilinrax (Deacon)
on Dec 21, 2000 at 21:46 UTC ( #47837=perlquestion: print w/replies, xml ) Need Help??

kilinrax has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking to build a simple script to perform searches on an index of web pages (approx 200, though all on different sites).
I'm not entirely sure about how to go about this, so anyone who could post potentially helpful CPAN modules, Perl techniques columns, et.c. then I'd be really grateful.
I'm not really looking for 'ready-to-use' scripts I'd prefer to use my own code, hopefully learning something from doing so, and the ones I've already come across through google searches have been rather unhelpful (and most with no strict or warnings, ick)

Replies are listed 'Best First'.
Re: Miniature search engine
by maverick (Curate) on Dec 21, 2000 at 21:53 UTC
    There was a similar question some time back about searching. rather than retype my thoughts, I'll just link you to my response then :)
    here is my comment and the thread of converstation starts here.

    Hope this helps

    /\/\averick

Re: Miniature search engine
by ichimunki (Priest) on Dec 21, 2000 at 21:59 UTC
    If the indices you wish to search are all available to use as web pages, then use LWP::UserAgent, build a user agent, and download all the pages in your list. Then using something like HTML::TokeParse to grab all the <a href> elements (and any other interesting bits of the pages), build a hash containing of all the things you want to search on (perhaps using lists of these things as the values).

    If the source pages are fairly constant, then run this separately from your search function and use Data::Dumper to save the hash to disk. Otherwise proceed immediately to...

    For searching the elements of the hash, use a function like grep to quickly isolate a list of keys.

    Using CGI present the information in a web page replete with hyperlinks and additional info by iterating over the list from the previous step.

    Have fun and think of other ways to do it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://47837]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2022-05-28 08:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (99 votes). Check out past polls.

    Notices?