Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

What Are Your Live Journal Friends Interested In?

by Ovid (Cardinal)
on Jan 29, 2004 at 23:55 UTC ( [id://325089]=CUFP: print w/replies, xml ) Need Help??

A friend of mine on LiveJournal noticed that many of his friends listed "writing" as an interest. He wondered how many, so I wrote a script to figure this out. Pass it a Live Journal username and it will print a list of interests, the number of friends of that username having that interest and the percentage of them having that interest.

This could use a bit of work (you might want to retry if a page fetch fails), but it's a decent start. To use it, if your username is "perl_lrep", you would do this:

./interests.pl -u perl_lrep

It's highly dependent on parsing the LJ HTML because their published interface does not provide "interests" lists. Also, note that there are several options available. If you try to run this over a slow connection, pass in a '--verbose 1' or '--verbose 2' option so you can see it processing while it runs, otherwise it might appear to be hung.

And yes, it's a bit of a hack :)

#!/usr/local/bin/perl use warnings; use strict; use WWW::Mechanize; use URI::Escape; use HTML::Entities 'decode_entities'; use HTML::TokeParser::Simple; use Getopt::Long; + no warnings 'defined'; # simple hack to avoid warnings for pages that +weren't fetched + $|++; # so we can see the results as they are printed + GetOptions( 'help|?' => \&usage, 'user=s' => \my $TARGET_USER, 'verbose=i' => \my $VERBOSE, 'syndicated' => \my $INCLUDE_SYNDICATED, 'communities' => \my $INCLUDE_COMMUNITIES, 'minimum=i' => \my $minimum, ); + $minimum ||= 1; $VERBOSE ||= 0; + # bad regexes. Need to improve them! use constant FRIENDS => { regex => qr{href='http://www\.livejournal\.com/(?:community|use +rs)/[^/]+/friends'}, label => 'friends', instance => qr{^/userinfo.bml\?user=(.*)$}, }; use constant INTERESTS => { regex => qr{href='/interests\.bml'}, label => 'interests', instance => qr{^/interests.bml\?int=(.*)$}, }; + my $MECH = WWW::Mechanize->new; + $TARGET_USER ||= die "You must supply an LJ username"; print "Fetching user info for ($TARGET_USER) ...\n" if $VERBOSE; my $html = get_user_info($TARGET_USER); + print "Fetching friends list for ($TARGET_USER)...\n" if $VERBOSE; my $users = get_list($html,FRIENDS); + my $current = 1; my $count = @$users; my %sections; foreach my $user (@$users) { print "Fetching $user: $current out of $count\n" if $VERBOSE; sleep 1; # be nice to their server $current++; print "Fetching user info for ($user) ...\n" if $VERBOSE > 1; my $html = get_user_info($user); next unless $html; print "Fetching interests for ($user) ...\n" if $VERBOSE > 1; my $interests = get_list($html, INTERESTS); foreach my $interest (@$interests) { $interest = uri_unescape($interest); $interest =~ tr/+/ /; $sections{$interest}++; } } + my @results = sort { $b->[1] <=> $a->[1] || $a->[0] cmp $b->[0] } map { [ $_, $sections{$_}, ($sections{$_} / $count) * 100] } grep { $sections{$_} >= $minimum } keys %sections; + foreach my $interest (@results) { printf "%30s %d %.1d%%\n", @$interest; } + sub get_list { my ($html,$section) = @_; my $parser = HTML::TokeParser::Simple->new(\$html); while (my $token = $parser->get_token) { next unless $token->as_is =~ /$section->{regex}/; last; } $parser->get_tag('td'); # advance to first td tag my @sections; while (my $token = $parser->get_token) { last if $token->is_end_tag('td'); # we're at the end of the me +mber # table data element next unless $token->is_start_tag('a'); if ($token->return_attr->{href} =~ $section->{instance}) { push @sections => decode_entities($1); } } printf("\t%d %s found\n", scalar @sections, $section->{label}) if $VERBOSE > 1; return \@sections; } + sub get_user_info { my $user = shift; my $info = sprintf "http://www.livejournal.c +om/userinfo.bml?user=%s&mode=full" => uri_escape($user); my $page = $MECH->get($info); my $html = $MECH->content; + if ('Error' eq $MECH->title && $html =~ /Unknown user/) { # this isn't perfect, but it's reasonable since LJ does # not return error codes warn "User ($user) not found"; return; } if ($user ne $TARGET_USER && $MECH->title =~ /Syndicated Acc +ount/ && ! $INCLUDE_SYNDICATED) { print "\tSkipping syndicated account ($user)\n" if $VERBOSE; return; } if ($user ne $TARGET_USER && $MECH->title =~ /Community Info +/ && ! $INCLUDE_COMMUNITIES) { print "\tSkipping community ($user)\n" if $VERBOSE; return; } return $html; } + sub usage { print <<" END_USAGE"; $0 will calculate the number of interests that the friends for a given user has. + $0 --user [options] + --help Display this information and exit --? Same as '--help' --user Mandatory. This is the user we will fetch friends f +or. --syndicated Include syndicated "friends" (e.g., doonesbury) --communities Include communities --minimum Takes and integer, \$x. Does not report on interest +s shared by only \$x or fewer people. --verbose Takes an integer (0, 1, or 2). If zero, will print +nothing but the interest list (this is the default). 1 and +2 will print more and more information. These are useful t +o let you know the program has not "hung" if you're working wi +th a large list or over a slow connection. + Example: + $0 --user publius_ovidius --verbose 2 --communities + That will calculate the common interests for friends of publius_ov +idius, displaying verbose information and includes community interests (b +ut does not include syndicated feed interests). + Note that arguments may be abbreviated to the first letter. The a +bove command may be written as: + $0 -u publius_ovidius -v 2 -c END_USAGE exit; }

Cheers,
Ovid

New address of my CGI Course.

Replies are listed 'Best First'.
Re: What Are Your Live Journal Friends Interested In?
by rkg (Hermit) on Jan 30, 2004 at 14:50 UTC
    Neat -- a new trick I hadn't seen before -- creating and referencing a "my" variable all at once:
    'user=s' => \my $TARGET_USER,

    While I have no interest in LiveJournal, I do value folks who post small hiqh-quality scripts in their entirety -- it is nice to learn idioms and approaches from More Perl Savvy Folks than myself.

    Thanks, ovid.

      Thanks for the kind words.

      The my trick is an old one that relies on a feature of my that is not documented in perldoc -f my (though my is so ubiquitous that I suspect many people have not even read that perldoc). You see, my, like most Perl functions, has a return value. It returns variable it's declaring. That allows you to do stuff like this:

      chomp(my $data = <FH>); while ((my $foo = some_func()) eq 'bar') { ... }

      You can also use it to assign values to more than one variable:

      perl -MData::Dumper -e 'my @a = my ($x,$y,$z) = localtime;print Dumper \@a'

      In short, were it not for this feature, Perl would lose many nifty timesavers.

      Cheers,
      Ovid

      New address of my CGI Course.

        Actualy, the last trick has nothing to do with the return of the my function -- an empty () would work just as well, and not effect what goes in @a at all.


        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

        For Getopt::Long, I like to go one step further and have the defaults 'inlined'. Where Ovid wrote:

        GetOptions( ... 'verbose=i' => \my $VERBOSE, ... 'minimum=i' => \my $minimum, ); + $minimum ||= 1; $VERBOSE ||= 0;

        I like to write:

        GetOptions( ... 'verbose=i' => \(my $VERBOSE = 0), ... 'minimum=i' => \(my $minimum = 1), );

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://325089]
Approved by footpad
Front-paged by footpad
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2024-04-18 17:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found