Unless you have a good ego, this is not such a cool use of Perl. If you do have good ego, then you may have noticed ActiveState's Mail Archive Leaders feature.

Following a series of events described in my use Perl journal, I created a small script that renders the count slightly more accurate. If you're on one of those lists, you might want to try it out. Ah the good old HTML scraping days...

CAVEAT: I never got the hang of format in Perl, hence the ugliness around the end. Any pointers to good sources (the perl docs didn't help much, for some weird reason) of examples are definitely appreciated.

#!/usr/bin/perl # getleaders [list-name] -- get the ten first people on ASPN archives, + with better acuracy # getleaders (defaults to perl-xml) # getleaders xml-dev (gets xml-dev) use strict; use vars qw($IN_PERSON %people); use LWP::Simple qw(); use HTML::Parser qw(); use constant BASE_URL => 'http://aspn.activestate.com/ASPN/Mail/Leader +s/'; my $list = shift || 'perl-xml'; my $url = BASE_URL . $list . '/'; my $html = LWP::Simple::get($url) or die "Could not get $url"; my $p = HTML::Parser->new( api_version => 3, start_h => [\&start_handler, 'tagname, attr'], text_h => [\&text_handler, 'dtext'], ); $p->unbroken_text(1); $p->parse($html); $p->eof; sub start_handler { my $tag = shift; my $attr = shift; if ($tag eq 'a' and $attr->{title} =~ m/Click to see postings by this + author/) { $IN_PERSON = 'person'; } } sub text_handler { my $txt = shift; return unless $IN_PERSON; $txt =~ s/^\s+//; $txt =~ s/\s+$//; if ($IN_PERSON eq 'person') { normalize(\$txt); $IN_PERSON = $txt; } elsif ($txt =~ m/\d+ posts/) { $people{$IN_PERSON} += $txt; # this numifies $IN_PERSON = undef; } } # this is very ad hoc sub normalize { my $txt = shift; $$txt = 'Ilya Sterin' if $$txt eq 'Sterin, Ilya'; $$txt = 'Barrie Slaymaker' if $$txt eq 'barries'; } # sort and print the result my @results = map { [ $_, $people{$_} ] } sort { $people{$b} <=> $peop +le{$a} } keys %people; my $longest = 0; for my $r (@results) { my $len = length $r->[0]; $longest = $len if $len > $longest; } my $nlen = length $results[0]->[1]; for my $i (0..9) { my $pad = ($i == 9) ? '' : '0'; print $pad . ($i + 1) . '. '; my $ppad = $longest - length $results[$i]->[0]; my $npad = $nlen - length $results[$i]->[1]; print $results[$i]->[0] . ' ' x $ppad . ' ' . ' ' x $npad . $resul +ts[$i]->[1] . "\n"; }

-- darobin -- knowscape 2 coming soon --


In reply to A little vanity by darobin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.