S_Shrum has asked for the wisdom of the Perl Monks concerning the following question:

I have a hash that has a number of items in it. Example:
[LOOP NEEDED HERE] { ... $links{$url}{html} = $html; ... $links{$new_url} = 1; ... $links{$url}{visited} = 1; ... }

Note from my somewhat cryptic snippet above, that new $links keys can/will be created but they will not have a defined (yet) {visited} key. I guess this is sort of what is called a recursive loop? As new items are added, I need to make sure that those items are also dealt with.

What I'm looking for is some sort of loop syntax that will evaluate the $links hash entries to see whether or not the {visited} key is not defined > I can then "visit" the $url and then set $links{$url}{visited} = 1.

TIA

======================
Sean Shrum
http://www.shrum.net

Replies are listed 'Best First'.
Re: Creating loop on undefined hash key value
by Courage (Parson) on Nov 23, 2002 at 09:54 UTC
    Either write
    for (grep {$links{$_}{visited}} keys %links) { #.... }
    or, a bit more efficient,
    for (keys %links) { if ($links{$_}{visited}) { #.... } }

    Courage, the Cowardly Dog

      Thanks for the reply...

      The only problem with the code you provided is that it does not deal with newly added keys (within the loop). The trick I'm looking for is the have the loop deal with new entries as well as those the pre-existed in the hash before the loop was started.

      Maybe a WHILE clause..hmmm. I need to think about this some more. There has to be a general way to deal with a dynamic hash loop, you'd think.

      Any other ideas???

      ======================
      Sean Shrum
      http://www.shrum.net

        in case you need to deal with newly added keys, I'll suggest you to create a simple stack and loop until it's empty:
        for (my @stack = keys %links; $#stack>=0; ) { my $current = shift @stack; next if $links{$current}{visited}; if (something_happens()) { push @stack, "another key"; } }

        Courage, the Cowardly Dog

•Re: Creating loop on undefined hash key value
by merlyn (Sage) on Nov 23, 2002 at 13:18 UTC
    Well, first, it won't work to mix your types like this:
    $links{$url}{html} = $html; ... $links{$new_url} = 1;
    The value at $links{$some_key} cannot both simultaenously be the number one and a subhash reference. I suggest you change it to:
    $links{$new_url}{visited} = 0;
    Then when you actually visit the node, change that 0 to a 1, and put the HTML in there.

    However, if all you're writing is a link checker or recursive web walker, you'd be about the 492nd person to do it this month. I suggest you save lots of time and look at WWW::Robot or WWW::SimpleRobot or any of my columns on that subject.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Quick background: I'm building a simple site index builder (no 'use lingua'). Maybe this time I'll actually finish it.

      I am familiar (a little) with Robot and SimpleRobot but they do not do dynamic, script generated pages (to my knowledge and pass experiences) which is how I need it to work.

      Robot and SimpleRobot will not read a page with a URL like:

      http://www.someserver.com/cgi-bin/template.pl?content=foo.htm

      I'm working with a subset of code written by Rob_au from his SiteRobot.pm (you've seen it before). His code works great (returns script URLs) but passes only a single dimensioned array of the page URLs...I wanted to build on this. I wanted to be able to retrieve more things like Title, Body, creation date, etc. Rob's code already retrieves this information but uses the information for validity checking and then throws it to the wind.

      My twist was to use a hash instead of his array since no dulpilcate key data can be created...therefore the listing of URLs would be unique. I thought about doing a AoH but that's too messy (I'd have to build in duplicate checking, code to pull the hashes out of the array, etc.). Ala K.I.S.S.

      So now that you see my quandry a bit more clearly, is there any more information that you can provide.

      TIA

      ======================
      Sean Shrum
      http://www.shrum.net

        It shouldn't be difficult to modify Robot or SimpleRobot to /not/ filter out GET-prametered URLs. Be careful, though, as there are some times you definatly /don't/ want to follow such links, such as when they cause voting, etc, to occour.

        There's probably a line that searches for a ? in the url, and rejects it. It's probably even commented.


        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: Creating loop on undefined hash key value
by BrowserUk (Patriarch) on Nov 24, 2002 at 07:27 UTC

    This demonstrates the general flow of the thing using recursion. If your pages have a lot of links, you could blow perls runtime stack.

    #! perl -slw use strict; sub getHTML{ my $url=shift; #! Replace with the code to get HTML for $url return join' : ', map{ local $"=''; #!" "www.@{[ chr(97+rand 26), int rand 10 ]}.com" } 0 .. rand 5; } sub getLinks{ my ($webref, $url) = @_; $webref->{$url}{html} = getHTML $url; #! Replace with code to extract links from html my @links = split' : ', $webref->{$url}{html}; return @links; } sub spider { my ($webref, $url) = @_; my @links = getLinks $webref, $url; for my $link (@links) { next if exists $webref->{$link}; spider( $webref, $link ); } } my %web; spider \%web, 'First.com'; for my $url (sort keys %web) { print "$url => $web{$url}{html}"; }

    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.