Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Selected Best Nodes Archive

by blokhead (Monsignor)
on Feb 26, 2004 at 00:04 UTC ( [id://331873] : note . print w/replies, xml ) Need Help??

in reply to Selected Best Nodes Archive

I thought of doing that myself as well. You could of course set up a cron job to just fetch the Selected Best Nodes to a timestamped file. I've set up the following cron job to archive that page into an SQLite database. It then prints out a reputation-sorted list of what it has already archived. That way, after a while, I'll have my own Top 5000 list.
#!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; use LWP::Simple; use DBI; my $db_file = "best_nodes"; my $pm_site = ""; my $make_table = ! -f $db_file; my $dbh = DBI->connect("dbi:SQLite:dbname=$db_file", "", "") or die "Can't connect to db: $DBI::errstr"; $dbh->do( qq[ create table nodes ( id int unique, title varchar(255), auth_id int, author varchar(255), rep int ) ]) if $make_table; my $html = get( sprintf $pm_site, 328478 ); my $te = HTML::TableExtract->new( headers => [ qw/Node Author Rep/ ], keep_html => 1 ); $te->parse($html); foreach my $row ($te->rows) { my ($node, $author, $rep) = @$row; my ($id) = $node =~ /\?node_id=(\d+)/; my ($auth_id) = $author =~ /\?node_id=(\d+)/; ($rep) = $rep =~ /(\d+)/; my ($title) = $node =~ m{>(.+?)</a>$}; ($author) = $author =~ m{>(.+?)</a>$}; $dbh->do("delete from nodes where id=?", undef, $id); $dbh->do("insert into nodes values (?,?,?,?,?)", undef, $id, $title, $auth_id, $author, $rep); } my $sth = $dbh->prepare( qq[ select id,title,auth_id,author,rep from nodes order by rep desc ]); $sth->execute; open my $fh, ">bestnodes.html" or die; print $fh "<table>\n"; while (my ($id, $title, $auth_id, $author, $rep) = $sth->fetchrow_arra +y) { $id = sprintf $pm_site, $id; $auth_id = sprintf $pm_site, $auth_id; print $fh qq[ <tr><td><a href="$id">$title</a></td> <td><a href="$auth_id">$author</a></td> <td>$rep</td></tr> ]; } print $fh "</table>\n";
Incidentally, this is my first experience with HTML::TableExtract, and it's just perfect for this job. Maybe I'll post the best nodes archive on my homepage once it gets big enough.


Replies are listed 'Best First'.
Re: Re: Selected Best Nodes Archive
by demerphq (Chancellor) on Feb 26, 2004 at 00:48 UTC

    Believe it or not I actually already coded XML support into the patches I wrote for the Best/Selected nodes. While the other users XML ticker uses this the other pages don't. (Yes internally other users are technically "picked_nodes") But the support is there and if the gods are amenable and I get the tuits to add the rest of the code then youll be able to get an XML feed of this data instead of scraping the HTML for it.

    I meant to release the final patches for the XML stuff right after the normal HTML changes went live, but I guess I got a distracted with other things. Sorry. :-)


      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi