http://qs1969.pair.com?node_id=115671

Unless you have a good ego, this is not such a cool use of Perl. If you do have good ego, then you may have noticed ActiveState's Mail Archive Leaders feature.

Following a series of events described in my use Perl journal, I created a small script that renders the count slightly more accurate. If you're on one of those lists, you might want to try it out. Ah the good old HTML scraping days...

CAVEAT: I never got the hang of format in Perl, hence the ugliness around the end. Any pointers to good sources (the perl docs didn't help much, for some weird reason) of examples are definitely appreciated.

#!/usr/bin/perl # getleaders [list-name] -- get the ten first people on ASPN archives, + with better acuracy # getleaders (defaults to perl-xml) # getleaders xml-dev (gets xml-dev) use strict; use vars qw($IN_PERSON %people); use LWP::Simple qw(); use HTML::Parser qw(); use constant BASE_URL => 'http://aspn.activestate.com/ASPN/Mail/Leader +s/'; my $list = shift || 'perl-xml'; my $url = BASE_URL . $list . '/'; my $html = LWP::Simple::get($url) or die "Could not get $url"; my $p = HTML::Parser->new( api_version => 3, start_h => [\&start_handler, 'tagname, attr'], text_h => [\&text_handler, 'dtext'], ); $p->unbroken_text(1); $p->parse($html); $p->eof; sub start_handler { my $tag = shift; my $attr = shift; if ($tag eq 'a' and $attr->{title} =~ m/Click to see postings by this + author/) { $IN_PERSON = 'person'; } } sub text_handler { my $txt = shift; return unless $IN_PERSON; $txt =~ s/^\s+//; $txt =~ s/\s+$//; if ($IN_PERSON eq 'person') { normalize(\$txt); $IN_PERSON = $txt; } elsif ($txt =~ m/\d+ posts/) { $people{$IN_PERSON} += $txt; # this numifies $IN_PERSON = undef; } } # this is very ad hoc sub normalize { my $txt = shift; $$txt = 'Ilya Sterin' if $$txt eq 'Sterin, Ilya'; $$txt = 'Barrie Slaymaker' if $$txt eq 'barries'; } # sort and print the result my @results = map { [ $_, $people{$_} ] } sort { $people{$b} <=> $peop +le{$a} } keys %people; my $longest = 0; for my $r (@results) { my $len = length $r->[0]; $longest = $len if $len > $longest; } my $nlen = length $results[0]->[1]; for my $i (0..9) { my $pad = ($i == 9) ? '' : '0'; print $pad . ($i + 1) . '. '; my $ppad = $longest - length $results[$i]->[0]; my $npad = $nlen - length $results[$i]->[1]; print $results[$i]->[0] . ' ' x $ppad . ' ' . ' ' x $npad . $resul +ts[$i]->[1] . "\n"; }

-- darobin -- knowscape 2 coming soon --

Replies are listed 'Best First'.
Re: A little (formatted) vanity
by briac (Sexton) on Oct 02, 2001 at 03:59 UTC

    Here are some experiments with format. It behaves in a rather weird fashion when eval'ed. It seems to replace all the 'template variables' in the format by their actual values when the evaluation is called. The (kludgy) workaround I found is to eval on each iteration of the for $i(0..9)

    The use of the eval is to be able to get the longest name of the list and construct the format according to it

    my $i; my $format = "format STDOUT =\n" . '@>>> ' . '@' . '<' x ($longest + 3) . " @<<<<\n" . '$i, $results[$i]->[0], $results[$i]->[1]' . "\n.\n"; print $format; eval $format; die $@ if $@; for $i (0..9) { write; }

    (I think that there should be a better way to do the above though)

    The most straightforward approach of using format in this script would be the following:

    my $i; format STDOUT_TOP= Rank Name Posts -------------------------------- . format STDOUT = @||| @<<<<<<<<<<<<<<<<<<<< @<<<< $i, $results[$i]->[0], $results[$i]->[1] . for $i (0..9) { write; }

    But in a nicer and more obfuscation prone style, one could prefer

    my $picture = '@||| @' . '<' x $longest . "@<<<<\n"; for my $i (0..9) { formline($picture, $i, $results[$i]->[0], $results[$i]->[1]); } print $^A; # use English; print $ACCUMULATOR;

    Eeek!

    Cheers,
    briac