Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

i want to write a program which does this;
open http://www.foreignsite.com/ search directory for *.html parse all html files for search word print results
i know how to search my own directories but how do i connect to another site and search their directories? like www.hotmail.com and search the files in there? can it be done with LWP? if so, where do i start? what do i do? thank you : )

Replies are listed 'Best First'.
Re: Foreign directory search with LWP?
by Beatnik (Parson) on Apr 27, 2001 at 12:53 UTC
    Directory listings are usually generated by the webserver, when a) there is no index file and b) the webserver is configuured to generate the directory listings...
    Like jouke says, in normal circumstances, it can't be done

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
Re: Foreign directory search with LWP?
by Jouke (Curate) on Apr 27, 2001 at 12:47 UTC
    Maybe my answer is too quick, but I don't think it's possible. A possibility would be allowing access to that site using FTP. With FTP you can get a complete directorylisting. As far as I know the HTTP protocol does not allow you to get directorylistings.

    Jouke Visser, Perl 'Adept'
      You're right, you usually can't get directory listing but
      it's sometimes possible to 'browse' directories if the server is setup to display directory indexes
      (Options indexes with Apache)
      It's not so uncommon nowadays...
      With so configured server you can parse the html page returning the directory content
      and 'browse' the way you want using LWP.

      UPDATE :
      Oops, It seems I duplicate Beatnik answer, see below...

      "Only Bad Coders Badly Code In Perl" (OBC2IP)
Re: Foreign directory search with LWP?
by little (Curate) on Apr 27, 2001 at 12:51 UTC
    NO, you can not search other websites (webservers) directory structure in the way you want to.
    Please raed more about how such things can work.

    and as the node titles may assume you have to have access to the ressource you want to search.
    But even or nonprotected webserver will not give you the directory listing if in each dir is an default file e.g. index.html or deafult.html.
    And I've never heard of someone who gave anonymous access to its webspace via ftp.


    Have a nice day
    All decision is left to your taste
Re: Foreign directory search with LWP?
by tune (Curate) on Apr 27, 2001 at 18:30 UTC
    Perhaps you want to build a spider that sniffs a site around and around. It is possible, in a recursive way. You parse the first page you get (for ex. index.html or what you get at the starting URL), indexing the keywords, or finding the keyword, then looking for other URL's in the page, and parse them too, until you dont' find more URL's or they are pointing to an external location.

    I would recommend to store keywords for every page in your local database, and provide the hits from there, and run your spider(s) in a timely basis.

    Good luck!

    Update: on part 2 this is discussed more detailed. I am just reading that now :)

    -- tune