I'm not sure this is a cool use for Perl, but once again, I am astounded by how easy the easy things really are. The script below is one of those where you're almost surprised when you're done writing it. "That's it?", you ask yourself. Yes, that's it.

Here's the background: there's a certain trail race that I'd like to run, but there are always more applicants than slots, so the organizers have resorted to a lottery system to pick entrants this year. Unfortunately, I didn't get in, but I am in the top 25 on the wait list.

The lottery winners have until midnight tonight to pay their entry fee - otherwise the wait-listed people move into their slots. On the lottery page, it is clearly indicated who has, and who hasn't, paid their entry fee yet. Now I could obsessively sit at my computer, refresh the page every five minutes and count the "Not Paid" entrants... or I could be obsessive and lazy, and enlist Perl for help.

With just three use directives, I'm in business:

use LWP::UserAgent; use HTML::TableParser; use XML::RSS;
And now in 50 non-optimized lines, I can easily write a script that screen-scrapes the web page (using LWP::UserAgent), counts the people who've paid and those who haven't (via HTML::TableParser), then print a simple RSS file (with XML::RSS) to a web-accessible spot that I've now added to my News Reader application (Google Reader).

The script is scheduled via cron. Since I can check my news reader on my phone, I am free to walk around, eat dinner etc. while tracking something I have absolutely no control over. Perfect!

I've done something like this before, in order to track the waiver wire in a fantasy league. But I am struck by how easy this really was, and totally worthwhile even though I can put this script in the trash after midnight.

I've also thought that this basic process - scrape -> parse -> post - can be implemented in thousands of ways using many other tools and technologies. Have other monks done similar things in the past? How would you have approached my problem?

use strict; use warnings; use LWP::UserAgent; use HTML::TableParser; use XML::RSS; use constant ENTRANTS_PAGE => 'http://www.example.com/lotteryentr?eventid=1221'; use constant RSS_FILE => '/var/www/html/lottery_entrants.xml'; my $response = LWP::UserAgent->new()->get(ENTRANTS_PAGE); die $response->status_line() unless ($response->is_success()); my ($paid, $not_paid) = (0, 0); my $p = HTML::TableParser->new([ { cols => 'Paid', row => sub { $_[2]->[3] eq 'Paid' and $paid++ or $_[2]->[3] eq 'Not Paid' and $not_paid++; } }], { Decode => 1, Trim => 1, Chomp => 1 }); $p->parse($response->content()); my $now = localtime; my $rss; if (-s RSS_FILE){ $rss = XML::RSS->new(); $rss->parsefile(RSS_FILE); } else{ $rss = XML::RSS->new( version => '2.0' ); $rss->channel( title => 'Lottery Entrants', pubDate => $now, syn => { updatePeriod => "hourly", updateFrequency => "3", updateBase => "1901-01-01T00:00+00:00", }); } $rss->add_item( title => "Entrants at $now", description => "$paid have paid, $not_paid haven't"); $rss->save(RSS_FILE);

In reply to Laziness through CPAN: Screen-scrape to RSS with 3 Modules by crashtest

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.