Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

i understand how to use LWP to get a file using the code below
$myURL="http://www.somesite.com/index.html"; $myPage=get($myURL);
but how do i search that directory for *.txt so that i can go on to parse all the text files?

Replies are listed 'Best First'.
Re: search a foreign directory using LWP
by hdp (Beadle) on Apr 25, 2001 at 08:44 UTC
    Calling it a "directory" is a little misleading, because it's a HTML page, not a directory (in, for example, the opendir and readdir senses).

    You'll need to parse the HTML and look for links to .txt files. I suggest HTML::Parser; you can try to do it with a regular expression, but it can get be incredibly hairy and lots of unneeded work (and I wouldn't suggest it).

    Then you can use LWP to get each of the links (to text files) that you find and process them as you will.

    hdp.

Re: search a foreign directory using LWP
by arturo (Vicar) on Apr 25, 2001 at 16:53 UTC

    If you're pondering writing your own 'spider' using LWP, a handy tool you should familiarize yourself with is HTML::LinkExtor, which will extract all the links on a page for you.

    But boy, oh boy, if you do this job right for sites of moderate complexity, you are going to learn a lot about complex data structures in perl =) Good Luck!