S_Shrum has asked for the wisdom of the Perl Monks concerning the following question:
This ? is in coorespondence with my task of creating a site indexing script (look at this node for more background).
What I am trying to do
======================
With the aid of HTML::LinkExtor, I am retrieving a list of links (converted to absolute URLs) from a primer (root) page. These links are stored into 3 hashes:
%all_links for all the links,
%distinct_links for a list of links with duplicates removed, and lastly
%constrained_links for links that conform to a user defined string (mainly used for regex things like "^http://www.shrum.net" as to control the traversal scope).
Aside: Thinking this over again, I should probably just use 1 hash and have a 'occurances' key and increment that for dupes. Anyways...
The first page will fill the hashes with some links to start tranversing therefore I am going to need to loop through them, right!?.
Here's the catch...as the traversal begins, additional links from the respective pages returned by the links will be added to the hashes.
How should I attack this beast? How can I set up a loop that will deal with a growing hash to address all the entries?
TIA.
======================
Sean Shrum
http://www.shrum.net
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How do I...? - Looping on a growing hash
by dws (Chancellor) on Mar 21, 2002 at 01:50 UTC | |
by S_Shrum (Pilgrim) on Mar 21, 2002 at 02:26 UTC | |
by dws (Chancellor) on Mar 21, 2002 at 03:55 UTC | |
|
Re: How do I...? - Looping on a growing hash
by demerphq (Chancellor) on Mar 27, 2002 at 16:55 UTC |