nrbrtkls has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks, I seek your wisdom concerning the following; I use the IMDB::Film module and would like to print the recommendations for a movie. the object method "recommendation_movies()" returns a list of recommended movies for a specified one as a hash where each key is a movie ID in IMDB and value - a movie's title. Could someone please show me how to extract this info from the hash? The things I that have tried and that didn't work looked like this:

use IMDB::Film; my $imdbObj = new IMDB::Film(crit => 'Monk'); if($imdbObj->status) { print "Title: ".$imdbObj->title()."\n"; print "Year: ".$imdbObj->year()."\n"; print "Plot Summary: ".$imdbObj->plot()."\n"; print "Rating: ".$imdbObj->rating()."\n"; my $recommendations = $imdbObj->recommendation_movies(); #<your_wisdom_here> while ( my ($key, $value) = each(%$recommendations) ) { print "$key => $value\n"; } #</your_wisdom_here> } else { print "Something wrong: ".$imdbObj->error."\n"; }

Thanks in advance!

Replies are listed 'Best First'.
Re: help a Dutchman with hash
by Limbic~Region (Chancellor) on May 29, 2011 at 20:03 UTC
    nrbrtkls,
    I am pretty sure that using IMDB::Film is a violation of IMDB's terms of service:

    Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.

    Additionally, if you view their http://www.imdb.com/robots.txt file, just about everything has been disallowed. Now I want to give Michael Stepanov the benefit of the doubt and assume that he got permission but then I question why he used LWP::Simple instead of WWW::Mechanzie (the former doesn't respect robots.txt while the latter does).

    It also seems pretty obvious to me that IMDB does not want people scraping their recommendations (potentially to reverse engineer the algorithm they developed). Read below for why I came to this conclusion which I admit is a pure guess.

    Assuming I am wrong about the TOS, I recommend you open a bug report. I checked the RT queue but did not see this particular one. Since it seemed like an interesting challenge, I decided to set out solving the problem by using the "view source" feature of Firefox and save a local copy of a handful of pages. The first thing I noticed is that the recommendations seen on the page are not in the source. Well, of course they are but not in the straight forward way you think. The second thing I noticed is that if you click on the "See more Recommendations", the original ones are not also listed.

    Please do not run the following code in violation of the TOS. As I said above, I developed it using a handful of pages downloaded from Firefox's "view source" to local files. This is also terribly ugly and prone to much breakage - I just wanted to see how to do it. I have emailed the author a pointer to this thread.

    #!/usr/bin/perl use strict; use warnings; use IMDB::Film; use LWP::Simple 'get'; my $imdb = new IMDB::Film(crit => '0442933'); die "Something went wrong: " . $imdb->error . "\n" if ! $imdb->status; for my $info (qw/title year plot rating/) { print ucfirst($info), ": ", scalar $imdb->$info, "\n"; } print "Recommendations:\n"; my $recs = fetch_recommendations($imdb); while (my ($id, $title) = each %$recs) { print "$id: $title\n"; } sub fetch_recommendations { my ($imdb) = @_; my $url = 'http://www.imdb.com/title/tt' . $imdb->id . '/recommend +ations'; my $content = get($url) || ''; my ($extract) = $content =~ /by the database(.*?)if you want to se +e if a movie /s; $extract = '' if ! defined $extract; my %rec; while ($extract =~ m|href="/title/tt(\d+)/">([^<]+)|g) { my ($id, $title) = ($1, $2); $rec{$id} = $title; } return \%rec; }

    Cheers - L~R

      Thanks for talking about terms of using IMDB.com. I agree with you but ... but first of all, this is a decision of somebody to grub info from IMDB of not. The module itself cannot do that :) Secondly, if IMDB.com would provide some web service. Even paid one, nobody would need to grub their info from their web site. Personally I don't care how people use the IMDB::Film. This is just a code. Nothing more.

      ---
      Michael Stepanov aka nite_man

      It's only my opinion and it doesn't have pretensions of absoluteness!

        nite_man,
        I believe it is irresponsible not to even mention in the documentation that by using your module, a user will be violating the TOS which in turn may be breaking the law. I found this article which seemed enlightening. There are also a number of sites that are providing a web API such as this one. It is unclear to me if they are downloading the text files that IMDB makes available or if they are in turn scraping IMDB. For the record, IMDB does provide a web service API for a fee (minimum of 15,000 USD) for commercial purposes and also indicates how to obtain written permission for personal screen scraping in their terms of service.

        Cheers - L~R

      To all who have responded to a monk in distress: Thank you very much! I will keep an eye out and see if there are others that I can help in return for your kindness. Thanks again for all your effort.

Re: help a Dutchman with hash
by bluescreen (Friar) on May 29, 2011 at 15:41 UTC

    Dude, you have a typo in my $recommendatons ( you're missing and "i". To prevent that from happening please use strict and use warnings

    Aside from that, I think the html parser in IMDB::Film is not working, this can be due a change in the page layout. Yes, it isn't using any API just bare bones to IMDB.com ( this is very fragile and if not broken today it will be eventually )

      Thanks for your reply! I have corrected the typo in my post. However, this typo was just a mistake while posting the question and editing it a bit to make it more readable and wasn't in the original code. So this module just parses the html? Hmmm....a pitty. I was looking forward to solving a problem with this module. I still would like to know the solution if there is one, just because I have been looking very hard to find out how to get this right. Thanks for your help.

Re: help a Dutchman with hash (IMDB::Film)
by luis.roca (Deacon) on May 29, 2011 at 17:55 UTC
    I gave this a try but had no luck. Here's what I did:
    #!/usr/bin/perl use strict; use warnings; use diagnostics; use Data::Dumper; use IMDB::Film; my $imdbObj = new IMDB::Film(crit => 'Casablanca'); my $recommendations = $imdbObj->recommendation_movies(); # Attempt to print out values from recommendation_movies() my %recommendations; print map { "$_ => $recommendations{$_}\n" } values %recommendatio +ns; ## Doesn't return anything ## # Debugging to see what's in recommendation_movies() $imdbObj->status ? print "$recommendations\n" . Dumper($recommendations) . "$imdbObj->recommendation_movies()\n" . Dumper($imdbObj->recommendation_movies()) : print "ERROR: $imdbObj->error"; __END__ RETURNS: HASH(0x100a385a8) $VAR1 = {}; IMDB::Film=HASH(0x100805140)->recommendation_movies() $VAR1 = {};
    I checked the IMDB::Film source code for recommendation_movies()
    recommendation_movies() Return a list of recommended movies for specified one as a has +h where each key is a movie ID in IMDB and value - movie's title +: $recommendation_movies = $film->recommendation_movies( +); For example, the list of recommended movies for Troy will be s +imilar to that: __DATA__ $VAR1 = { + + '0416449' => '300', + + '0167260' => 'The Lord of the Rings: The Return of t +he King', + '0442933' => 'Beowulf', + + '0320661' => 'Kingdom of Heaven', + + '0172495' => 'Gladiator' + + };

    I tried printing  $recommendation_movies but got an 'ERROR: Can't call method "$recommendation_movies" on an undefined value'

    Sorry but that's as far as I got. Hopefully some of this information will help you (or help someone help you) get a little closer to solving the problem.

    *Quick side note: You may want to change the title of your post to something like: "Print Hash Values From IMDB::Film". Although your title is funny, it doesn't really do very much for your question. You'll have more luck getting monks to view your post with something that clearly and succinctly describes your problem.

    Good luck! It's a fun module ( for me at least :) ) thanks for posting the question.
    Luis

    "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." — Don Quixote