As noted before (416363) I'm updating the Phalanx 100. I appreciate all the input. Here's what I've come up with, incorporating a lot of ideas from the previous thread.

The main ideas that I wanted to incorporate:

Anything else I should be looking at? Any snazzy way you can think of to tell what's core vs. what's not? Module::CoreList is based on modules, not distributions.
#!/usr/bin/perl use strict; use warnings; use Data::Hash::Totals; my %totals; my %piggy; my $rx_agent_ignore = qr/ \. google \. | \. yahoo \. | \b Seekbot | \b MS\ Search \b | \b Interarchy \b | \b teoma \b /x; while (<DATA>) { chomp; my (undef,$naughty) = split; ++$piggy{ $naughty }; } my %dl; while (<>) { chomp; my ($who,undef,$what,$with) = split / /, $_, 4; next if $piggy{$who}; next if $with =~ /$rx_agent_ignore/oi; $what =~ s[.+/][]; $what =~ s/\.(tar\.gz|zip)$// or next; # Only want tarballs next if $what =~ /^perl-?5/; $what =~ s/(rc|b)\d+$//; # Handle release candidates and betas $what =~ s/-[\d._]+a?$//; # The "a" is for DateManip $what = "lanman" if $what =~ /lanman/; # Same person downloads twice, don't count it. ++$totals{$what} unless $dl{"$what\t$who"}++; } # while print as_table( \%totals ); # List of piggies follows __END__ 61016 77931 53912 70499 34716 66057 33552 10180 31884 12603 31000 12736 30056 79880 29867 15895 17420 71946 16736 18470 16652 81080 15614 26708 12607 53910 11703 37108 11265 54276 10912 44140 10406 24149 9243 41125 8944 20683 8866 11474 8836 15721 8814 22751 8715 59788 8578 50934 8313 13309 8292 13675 8286 78646 8274 70561 8271 71129 8249 51956 8247 56860 8229 45259 8200 17160 8185 26947 8185 29871 8123 81511 7938 83805 7267 58137 7244 53816 6859 49585 5560 75270 5476 21874 4790 27126
So the list that I'm left with looks like this (top 100 only)
10313 Net_SSLeay.pm 7848 DBD-mysql 7542 DBI 3649 perl-ldap 3371 Mail-SpamAssassin 2745 HTML-Parser 2704 GD 2474 libwww-perl 2261 Digest-SHA1 2258 MIME-Base64 2185 XML-Parser 1816 Compress-Zlib 1804 URI 1688 Digest-MD5 1616 DBD-Pg 1594 Digest 1588 Time-HiRes 1297 HTML-Tagset 1292 Tk 1276 MIME-tools 1193 Archive-Tar 1185 Net-DNS 1171 libnet 1163 Test-Simple 1143 Gtk-Perl 1088 Archive-Zip 1072 Digest-HMAC 1031 MailTools 1027 HTML-Template 994 DB_File 983 Apache-ASP 977 CGI.pm 969 DBD-Oracle 849 DateManip 833 IO-stringy 833 Storable 824 Msql-Mysql-modules 783 Net-Telnet 762 XML-Writer 747 CPAN 730 Template-Toolkit 649 AppConfig 643 Convert-ASN1 630 TimeDate 616 MIME-Lite 607 IO-String 586 MD5 574 Crypt-SSLeay 569 Date-Calc 566 dmake-4.1pl1-win32 547 IMAP-Admin 532 XML-Generator 525 GDGraph 511 mod_perl 503 File-Scan 501 Net-SNMP 501 Test-Harness 498 XML-Simple 487 TermReadKey 481 IO-Socket-SSL 475 PathTools 456 GDTextUtil 450 IO-Zlib 427 Spreadsheet-WriteExcel 426 Module-Build 426 SOAP-Lite 425 Data-Dumper 410 BerkeleyDB 376 PodParser 375 ExtUtils-MakeMaker 375 Mail-Sendmail 371 Parse-RecDescent 369 Authen-SASL 364 Crypt-DES 364 File-Tail 360 Authen-PAM 360 Bit-Vector 356 DBD-ODBC 353 Convert-TNEF 351 Unix-Syslog 337 Carp-Clan 329 Net-Server 325 OLE-Storage_Lite 325 PerlMagick 302 XML-SAX 299 Event 298 IPC-Run 296 Params-Validate 294 Unicode-String 294 XML-LibXML 292 Convert-UUlib 279 DBD-DB2 276 File-Temp 273 XML-DOM 264 Net-Daemon 264 XML-NamespaceSupport 258 Chart 258 Crypt-CBC 257 PlRPC 251 Gtk

xoxo,
Andy

Replies are listed 'Best First'.
Re: Updated Phalanx 100 stats
by Zaxo (Archbishop) on Jan 09, 2005 at 09:01 UTC

    Should core modules get a bump for each download of perl?

    After Compline,
    Zaxo

      Similarly, should the AS-core modules get a bump with each download of activestate (and how do we figure out, even roughly, how many there are)? I suspect this is one reason that a DBD has more downloads then DBI itself.


      Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

        I suspect this is one reason that a DBD has more downloads then DBI itself.
        I was kind of wondering that myself. Can't use DBD::MySQL without using DBI...

        thor

        Feel the white light, the light within
        Be your own disciple, fan the sparks of will
        For all of us waiting, your kingdom will come

      Actually, I want core modules removed.

      xoxo,
      Andy

        Second.
Re: Updated Phalanx 100 stats
by jkeenan1 (Deacon) on Jan 09, 2005 at 17:13 UTC
    Andy: I'm attempting to copy-and-paste your code and use against file in tarball. But I'm not clear what's supposed to go in the DATA handle.

    When calling phalanx.pl ./cpan-gets, I get:

    Name "main::DATA" used only once: possible typo at /Users/jimk/bin +/perl/phalanx.pl line 19. readline() on unopened filehandle DATA at /Users/jimk/bin/perl/pha +lanx.pl line 19.

    What am I not getting?

    jimk

    Update: I got it. I realized that the 2 columns of numbers you listed had to go after __END__.