in reply to Re: issues maintaining uniqueness
in thread issues maintaining uniqueness

Thanks for the reply mischief,

I actually didn't make that assumption. I could have easily not worried about things being unique until I got to the IP values - but by doing that I am wasting processing power and bandwidth by doing a DNS lookup on the same URL multiple times.

This is why I decided to make sure I pass a unique set of hostnames to the Net:DNS::Resolver and then run the results (IPs (including duplicates)) through the Array::Unique module again.

In fact if you try the script as is and compare the regex results before and after passing through the Array:Unique module you will see I am saving a large amount of duplication.

Replies are listed 'Best First'.
Re^3: issues maintaining uniqueness
by mr_mischief (Monsignor) on Apr 30, 2008 at 19:58 UTC
    That's better than what I thought I read, though you could cut the resolver overhead just as much by checking against each hostname before you look it up to see if it's been looked up before. This sort of task is just begging for a hash. That would use one loop instead of two, and probably be simpler to follow.

    my %looked_up; my @urls = qw( list of URLS however you got them ); my @ips; foreach ( @urls ) { my $hostname = extract_hostname_from_url( $_ ); unless ( exists $looked_up{ $hostname ) { my $packed_ip = gethostbyname( $hostname ); if (defined $packed_ip) { $ip_address = inet_ntoa($packed_ip); push @ips, $ip_address; $looked_up{ $hostname } = 1; } } } # do with @ips whatever you were going to do with them

    This retries hostname lookups that fail. You could change that easily by moving the autovivifying hash element assignment outside the if block for the packed IP address being defined.

    the sub extract_hostname_from_url is left as an exercise.

      Thank you everyone!

      I can't say it wasn't painful - but the hashing worked like a charm. Shout out to mischief for getting me to think outside the box in regards to order of operations.

      This was a pretty crazy first attempt at scripting for me but the community here really came through with stellar advice!

      Cheers!