What Are Your Live Journal Friends Interested In?

A friend of mine on LiveJournal noticed that many of his friends listed "writing" as an interest. He wondered how many, so I wrote a script to figure this out. Pass it a Live Journal username and it will print a list of interests, the number of friends of that username having that interest and the percentage of them having that interest.

This could use a bit of work (you might want to retry if a page fetch fails), but it's a decent start. To use it, if your username is "perl_lrep", you would do this:

./interests.pl -u perl_lrep

It's highly dependent on parsing the LJ HTML because their published interface does not provide "interests" lists. Also, note that there are several options available. If you try to run this over a slow connection, pass in a '--verbose 1' or '--verbose 2' option so you can see it processing while it runs, otherwise it might appear to be hung.

And yes, it's a bit of a hack :)

#!/usr/local/bin/perl
use warnings;
use strict;
use WWW::Mechanize;
use URI::Escape;
use HTML::Entities 'decode_entities';
use HTML::TokeParser::Simple;
use Getopt::Long;
                                                                      
+          
no warnings 'defined'; # simple hack to avoid warnings for pages that 
+weren't fetched
                                                                      
+          
$|++; # so we can see the results as they are printed
                                                                      
+          
GetOptions(
    'help|?'      => \&usage,
    'user=s'      => \my $TARGET_USER,
    'verbose=i'   => \my $VERBOSE,
    'syndicated'  => \my $INCLUDE_SYNDICATED,
    'communities' => \my $INCLUDE_COMMUNITIES,
    'minimum=i'   => \my $minimum,
);
                                                                      
+          
$minimum ||= 1;
$VERBOSE ||= 0;
                                                                      
+          
# bad regexes.  Need to improve them!
use constant FRIENDS   => {
    regex    => qr{href='http://www\.livejournal\.com/(?:community|use
+rs)/[^/]+/friends'},
    label    => 'friends',
    instance => qr{^/userinfo.bml\?user=(.*)$},
};
use constant INTERESTS => {
    regex    => qr{href='/interests\.bml'},
    label    => 'interests',
    instance => qr{^/interests.bml\?int=(.*)$},
};
                                                                      
+          
my $MECH = WWW::Mechanize->new;
                                                                      
+          
$TARGET_USER ||= die "You must supply an LJ username";
print "Fetching user info for ($TARGET_USER) ...\n" if $VERBOSE;
my $html = get_user_info($TARGET_USER);
                                                                      
+          
print "Fetching friends list for ($TARGET_USER)...\n" if $VERBOSE;
my $users = get_list($html,FRIENDS);
                                                                      
+          
my $current = 1;
my $count   = @$users;

my %sections;
foreach my $user (@$users) {
    print "Fetching $user:  $current out of $count\n" if $VERBOSE;
    sleep 1; # be nice to their server
    $current++;
    print "Fetching user info for ($user) ...\n" if $VERBOSE > 1;
    my $html = get_user_info($user);
    next unless $html;
    print "Fetching interests for ($user) ...\n" if $VERBOSE > 1;
    my $interests = get_list($html, INTERESTS);
    foreach my $interest (@$interests) {
        $interest = uri_unescape($interest);
        $interest =~ tr/+/ /;
        $sections{$interest}++;
    }
}
                                                                      
+          
my @results =
    sort {
        $b->[1] <=> $a->[1]
            ||
        $a->[0] cmp $b->[0]
    }
    map  { [ $_, $sections{$_}, ($sections{$_} / $count) * 100] }
    grep { $sections{$_} >= $minimum }
        keys %sections;
                                                                      
+          
foreach my $interest (@results) {
    printf "%30s %d %.1d%%\n", @$interest;
}
                                                                      
+          
sub get_list
{
    my ($html,$section) = @_;
    my $parser = HTML::TokeParser::Simple->new(\$html);
    while (my $token = $parser->get_token) {
        next unless $token->as_is =~ /$section->{regex}/;
        last;
    }
    $parser->get_tag('td'); # advance to first td tag
    my @sections;
    while (my $token = $parser->get_token) {
        last if $token->is_end_tag('td'); # we're at the end of the me
+mber
                                          #  table data element
        next unless $token->is_start_tag('a');
        if ($token->return_attr->{href} =~ $section->{instance}) {
            push @sections => decode_entities($1);
        }
    }
    printf("\t%d %s found\n", scalar @sections, $section->{label})
        if $VERBOSE > 1;
    return \@sections;
}
                                                                      
+          
sub get_user_info {
    my $user = shift;     my $info = sprintf "http://www.livejournal.c
+om/userinfo.bml?user=%s&mode=full"
        => uri_escape($user);
    my $page = $MECH->get($info);
    my $html = $MECH->content;
                                                                      
+          
    if ('Error' eq $MECH->title && $html =~ /Unknown user/) {
        # this isn't perfect, but it's reasonable since LJ does
        # not return error codes
        warn "User ($user) not found";
        return;
    }     if ($user ne $TARGET_USER && $MECH->title =~ /Syndicated Acc
+ount/ && ! $INCLUDE_SYNDICATED) {
        print "\tSkipping syndicated account ($user)\n" if $VERBOSE;
        return;
    }     if ($user ne $TARGET_USER && $MECH->title =~ /Community Info
+/ && ! $INCLUDE_COMMUNITIES) {
        print "\tSkipping community ($user)\n" if $VERBOSE;
        return;
    }
    return $html;
}
                                                                      
+          
sub usage
{
    print <<"    END_USAGE";
    $0 will calculate the number of interests that the friends for
    a given user has.
                                                                      
+          
    $0 --user [options]
                                                                      
+          
    --help        Display this information and exit
    --?           Same as '--help'
    --user        Mandatory.  This is the user we will fetch friends f
+or.
    --syndicated  Include syndicated "friends" (e.g., doonesbury)
    --communities Include communities
    --minimum     Takes and integer, \$x.  Does not report on interest
+s shared
                  by only \$x or fewer people.
    --verbose     Takes an integer (0, 1, or 2).  If zero, will print 
+nothing
                  but the interest list (this is the default).  1 and 
+2 will
                  print more and more information.  These are useful t
+o let you
                  know the program has not "hung" if you're working wi
+th a
                  large list or over a slow connection.
                                                                      
+          
    Example:
                                                                      
+          
    $0 --user publius_ovidius --verbose 2 --communities
                                                                      
+          
    That will calculate the common interests for friends of publius_ov
+idius,
    displaying verbose information and includes community interests (b
+ut does
    not include syndicated feed interests).
                                                                      
+          
    Note that arguments may be abbreviated to the first letter.  The a
+bove
    command may be written as:
                                                                      
+          
    $0 -u publius_ovidius -v 2 -c
    END_USAGE
    exit;
}
[download]

Cheers,
Ovid

New address of my CGI Course.

Comment on What Are Your Live Journal Friends Interested In? Download Code

Replies are listed 'Best First'.
Re: What Are Your Live Journal Friends Interested In? by rkg (Hermit) on Jan 30, 2004 at 14:50 UTC
Neat -- a new trick I hadn't seen before -- creating and referencing a "my" variable all at once: `'user=s' => \my $TARGET_USER,` [download] While I have no interest in LiveJournal, I do value folks who post small hiqh-quality scripts in their entirety -- it is nice to learn idioms and approaches from More Perl Savvy Folks than myself. Thanks, ovid.	[reply] [d/l]
Re(2): What Is 'my' Interested In? by Ovid (Cardinal) on Jan 30, 2004 at 15:56 UTC
Thanks for the kind words. The my trick is an old one that relies on a feature of my that is not documented in `perldoc -f my` (though my is so ubiquitous that I suspect many people have not even read that perldoc). You see, my, like most Perl functions, has a return value. It returns variable it's declaring. That allows you to do stuff like this: `chomp(my $data = <FH>); while ((my $foo = some_func()) eq 'bar') { ... }` [download] You can also use it to assign values to more than one variable: `perl -MData::Dumper -e 'my @a = my ($x,$y,$z) = localtime;print Dumper \@a'` In short, were it not for this feature, Perl would lose many nifty timesavers. Cheers, Ovid New address of my CGI Course.	[reply] [d/l] [select]
Re: Re(2): What Is 'my' Interested In? by theorbtwo (Prior) on Jan 30, 2004 at 17:49 UTC
Actualy, the last trick has nothing to do with the return of the my function -- an empty () would work just as well, and not effect what goes in @a at all. Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).	[reply]
Re: Re: Re(2): What Is 'my' Interested In? by ysth (Canon) on Jan 30, 2004 at 18:43 UTC
Re: Re(2): What Is 'my' Interested In? by benizi (Hermit) on Feb 02, 2004 at 20:24 UTC
For Getopt::Long, I like to go one step further and have the defaults 'inlined'. Where Ovid wrote: `GetOptions( ... 'verbose=i' => \my $VERBOSE, ... 'minimum=i' => \my $minimum, ); + $minimum \|\|= 1; $VERBOSE \|\|= 0;` [download] I like to write: `GetOptions( ... 'verbose=i' => \(my $VERBOSE = 0), ... 'minimum=i' => \(my $minimum = 1), );` [download]	[reply] [d/l] [select]


Problems? Is your data what you think it is?
	PerlMonks