How many web addresses is a "whole lot"?
For instance, how long does it take to run this:
#!/usr/bin/perl -w
use strict;
my @addresses = qw|http://www.foo.com http://www.foo1.com http://www.f
+oo2.com http://www.foo.com http://www.a.com http://www.a.com http:/
+/www.a.com http://www.a.com|;
my %add;
for(@addresses){$add{$_}++};
print sort {$add{$b} <=> $add{$a}} keys %add;
1;
Update: LOL. ikegami rewrote my code, but do you realize that now this routine will sort the same 100K+ list twice just to get the top 10? I mean, heck, neither of our answers is really correct when you think about the volume...perhaps the data should be pushed to a database instead?
Celebrate Intellectual Diversity
| [reply] [d/l] |
- Reverted names back to those the OP uses.
- Switch to pre-increment for speed boost in some perls.
- Answer both questions, not just one.
- Made input list more readable.
- Made printed list more readable.
- Removed extraneous 1;.
use strict;
use warnings;
my @sites = qw|
http://www.foo.com
http://www.foo1.com
http://www.foo2.com
http://www.foo.com
http://www.a.com
http://www.a.com
http://www.a.com
http://www.a.com
|;
my %sites;
++$sites{$_} foreach @sites;
my @validsites = sort { $a <=> $b } keys %sites;
my @popularsites = sort { $sites{$b} <=> $sites{$a} } keys %sites;
splice(@popularsites, 10); # Keep only the 10 most popular.
{
local $, = "\n";
local $\ = "\n";
print 'Most Popular Sites',
'------------------',
@popularsites,
'',
'All Sites',
'---------',
@validsites;
}
| [reply] [d/l] [select] |
a whole lot is about 100,000.
| [reply] |
At the risk of xp--, but in the interest of being helpful, this is untested here, surely bug-filled, off the cuff, and written w/o consultation to any working code or documentation, so use at your own risk:
sub sitecounter {
my @sites = @_;
my(@sitepopularity);
my($s,$site,@sitessorted,@validsites,@tmp);
@sitessorted = sort @sites;
foreach $site (@sitessorted){
push @validsites, $site unless $validsite[-1] eq $site;
}
foreach $site (@validsites){
foreach $s (@sites){ push @tmp $s if $site eq $s; }
push @sitepopularity ($site, #@tmp);
}
return @sitepopularity;
}
Of course it depends on how you define a site, should you add a subroutine to handle regex mangling to process the root domain, instead of the individual pages themselves, as this one does? But if you debug that, that ought to get you started.
Check out the docs for push, pop, sort, and if this was a homework question I helped you with, the caveats offered to new users here who ask others to do their homework for them.
-- Hugh | [reply] [d/l] |
Thanks a lot for your help, another useful thing would be to show the amount of "hits", occurences the most popular sites have received.
| [reply] |