Re: issues maintaining uniqueness

Replies are listed 'Best First'.
Re^2: issues maintaining uniqueness by jkstraw (Novice) on Apr 30, 2008 at 19:20 UTC
Hi mischief, I don't think this is correct - I am trying to do things this way for that reason specifically. If I didn't make the URLs unique before passing them to the Net::DNS::Resolver module I would be doing even more duplication (some duplication is unavoidable as you correctly pointed out). By resolving only unique URLs I am minimizing the amount of DNS resolution that is required. It would be wasteful to resolve the exact same URL multiple times as it would always yield the same result.	[reply]
Re^2: issues maintaining uniqueness by jkstraw (Novice) on Apr 30, 2008 at 18:46 UTC
Thanks for the reply mischief, I actually didn't make that assumption. I could have easily not worried about things being unique until I got to the IP values - but by doing that I am wasting processing power and bandwidth by doing a DNS lookup on the same URL multiple times. This is why I decided to make sure I pass a unique set of hostnames to the Net:DNS::Resolver and then run the results (IPs (including duplicates)) through the Array::Unique module again. In fact if you try the script as is and compare the regex results before and after passing through the Array:Unique module you will see I am saving a large amount of duplication.	[reply]
Re^3: issues maintaining uniqueness by mr_mischief (Monsignor) on Apr 30, 2008 at 19:58 UTC
That's better than what I thought I read, though you could cut the resolver overhead just as much by checking against each hostname before you look it up to see if it's been looked up before. This sort of task is just begging for a hash. That would use one loop instead of two, and probably be simpler to follow. `my %looked_up; my @urls = qw( list of URLS however you got them ); my @ips; foreach ( @urls ) { my $hostname = extract_hostname_from_url( $_ ); unless ( exists $looked_up{ $hostname ) { my $packed_ip = gethostbyname( $hostname ); if (defined $packed_ip) { $ip_address = inet_ntoa($packed_ip); push @ips, $ip_address; $looked_up{ $hostname } = 1; } } } # do with @ips whatever you were going to do with them` [download] This retries hostname lookups that fail. You could change that easily by moving the autovivifying hash element assignment outside the if block for the packed IP address being defined. the sub `extract_hostname_from_url` is left as an exercise.	[reply] [d/l] [select]
Re^4: issues maintaining uniqueness by Anonymous Monk on May 01, 2008 at 01:49 UTC
Thank you everyone! I can't say it wasn't painful - but the hashing worked like a charm. Shout out to mischief for getting me to think outside the box in regards to order of operations. This was a pretty crazy first attempt at scripting for me but the community here really came through with stellar advice! Cheers!	[reply]