in reply to How do I...? - Looping on a growing hash

Why keep both %all_links and %distinct_links?   $links{$aLink}++; handles both cases. All you need is   keys %links to give you distinct links. And you don't need to iterate over hashs. All you need is something like
foreach $link ( @extracted_links ) { $links{$link}++; $constrainted_linkes{$link}++ if ( 0 < grep { $link =~ m/$_/ } @constraints; }
The rest is details.

Replies are listed 'Best First'.
Re: Re: How do I...? - Looping on a growing hash
by S_Shrum (Pilgrim) on Mar 21, 2002 at 02:26 UTC
    I was figuring on being able to report on how many times a particular URL came up. I just changed my code to deal with 1 hash, like this: %all_links = ( URL_STRING = ( occurances, visit, title, content, traversed,) );

    Code-wise, I will check if the URL needs to be traversed via a positive value is in the VISIT key (VISIT eq to the value returned by INDEX function) if the URL_STRING contains constraining factors.

    I would then need to traverse those pages that contain a positive VISIT value and have a undefined or (if I preset the value to -1) a negative TRAVERSED value.

    Note: Keying off the TITLE and/or CONTENT values might cause a inifinate loop if a page has no title or body content (why a page would have no title or body is beyond me but it's a possible issue).

    I know I left some of these specifics out of the original post so I humblily bend over and shout: "Thank you sir, may I have another!" ;-}

    PS: Thinking ahead: can I increment the value of a key like this: $all_links{URL_STRING}{occurances}++;

    ...or do I have to do this:

    $occurances = $all_links{URL_STRING}{occurances} + 1; $all_links{URL_STRING}{occurances} = $occurances;

    TIA (again) ;)

    ======================
    Sean Shrum
    http://www.shrum.net

      I just changed my code to deal with 1 hash, ...

      By using one top-level hash, you're buying into a set of intermediate-level data structure problems that you could avoid by using several (5?) top-level hashes, one per "purpose". I recommend you do the latter.