My friends and I all like to keep online journals. I would've liked to try something other than Xanga, but all my friends recommended it. The problem was that the "archive" feature is a paid service. So whenever I felt like going back to read some old entries, I had to keep clicking the "Next 5" link. Relatively new to Perl still (about 3 weeks or so into it), but I came up with something I find genuinely useful! Thanks Monks! Hopefully someone else can use this...
#Usage: archive.pl USERNAME
#
#Description: Saves all entries of USERNAME's xanga to "archive.html"
+ in the working directory
use LWP::UserAgent;
$end = 'http://www.xanga.com/';
if ($a=shift) {
$uid = $a;
} else {
print "What is your username? ";
$uid = <STDIN>;
chop $uid;
}
$first_page = 'http://www.xanga.com/home.aspx?user=' . $uid;
print "Connecting to $uid's Xanga...\n";
grab($first_page);
$next_page = save(); #save() returns the url to Next 5
print "\$next_page is $next_page\n";
until ($finished) {
grab($next_page);
$next_page = save();
print "\$next_page is $next_page\n";
last if $next_page =~ /$end$/;
}
print "\n\n\nCompleted Archiving\n\n\n";
#Usage: grab(url)
#
#Description:
sub grab{
open TMP, ">tmp.html" or die;
$url = shift;
print "grabbing $url\n";
$ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");
# Be nice to Xanga servers ;-)
sleep 5;
# Create a request
my $req = HTTP::Request->new(GET => $url);
$req->content_type('application/x-www-form-urlencoded');
$req->content('query=libwww-perl&mode=dist');
# Pass request to the user agent and get a response back
my $res = $ua->request($req);
# Check the outcome of the response
if ($res->is_success) {
print TMP $res->content;
close TMP;
print "Successfully grabbed html...\n";
}
else {
print $res->status_line, "\n";
}
}
#Useage: save(url);
#
#Description: sub save parses through a given URL and appends all fo
+und entries of that page to
# "archive.html" It also finds the url of the next page to gra
+b
sub save {
open IN, "tmp.html" or die;
open OUT, ">>archive.html" or die;
print "Saving...\n";
while ($line = <IN>) {
if ($line =~ /<div class="blogheader">/) { last; }
}
print OUT $line;
print "Wrote out \$line\n";
REST: while($line = <IN>) {
print OUT $line;
last REST if $line =~ /Next 5 >>/;
}
print "Saved\n";
$line = reverse($line);
$line =~ /"(.*?)"/;
close IN;
close OUT;
$a = 'http://www.xanga.com/' . reverse($1); #home.aspx?user=....
}
I know it's a bit crude, but it works! ;-) For now I'm too lazy to clean it up properly, but suggestions would be great! When I feel like it I'd think I'd add incremental archiving (instead of going through entire xanga), a GUI, saving images and comments to harddrive, etc...
janitored by ybiC: Balanced <readmore> tags around longish codeblock, to reduce scrolling