in reply to help a Dutchman with hash

nrbrtkls,
I am pretty sure that using IMDB::Film is a violation of IMDB's terms of service:

Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.

Additionally, if you view their http://www.imdb.com/robots.txt file, just about everything has been disallowed. Now I want to give Michael Stepanov the benefit of the doubt and assume that he got permission but then I question why he used LWP::Simple instead of WWW::Mechanzie (the former doesn't respect robots.txt while the latter does).

It also seems pretty obvious to me that IMDB does not want people scraping their recommendations (potentially to reverse engineer the algorithm they developed). Read below for why I came to this conclusion which I admit is a pure guess.

Assuming I am wrong about the TOS, I recommend you open a bug report. I checked the RT queue but did not see this particular one. Since it seemed like an interesting challenge, I decided to set out solving the problem by using the "view source" feature of Firefox and save a local copy of a handful of pages. The first thing I noticed is that the recommendations seen on the page are not in the source. Well, of course they are but not in the straight forward way you think. The second thing I noticed is that if you click on the "See more Recommendations", the original ones are not also listed.

Please do not run the following code in violation of the TOS. As I said above, I developed it using a handful of pages downloaded from Firefox's "view source" to local files. This is also terribly ugly and prone to much breakage - I just wanted to see how to do it. I have emailed the author a pointer to this thread.

#!/usr/bin/perl use strict; use warnings; use IMDB::Film; use LWP::Simple 'get'; my $imdb = new IMDB::Film(crit => '0442933'); die "Something went wrong: " . $imdb->error . "\n" if ! $imdb->status; for my $info (qw/title year plot rating/) { print ucfirst($info), ": ", scalar $imdb->$info, "\n"; } print "Recommendations:\n"; my $recs = fetch_recommendations($imdb); while (my ($id, $title) = each %$recs) { print "$id: $title\n"; } sub fetch_recommendations { my ($imdb) = @_; my $url = 'http://www.imdb.com/title/tt' . $imdb->id . '/recommend +ations'; my $content = get($url) || ''; my ($extract) = $content =~ /by the database(.*?)if you want to se +e if a movie /s; $extract = '' if ! defined $extract; my %rec; while ($extract =~ m|href="/title/tt(\d+)/">([^<]+)|g) { my ($id, $title) = ($1, $2); $rec{$id} = $title; } return \%rec; }

Cheers - L~R

Replies are listed 'Best First'.
Re^2: help a Dutchman with hash
by nite_man (Deacon) on May 30, 2011 at 07:28 UTC
    Thanks for talking about terms of using IMDB.com. I agree with you but ... but first of all, this is a decision of somebody to grub info from IMDB of not. The module itself cannot do that :) Secondly, if IMDB.com would provide some web service. Even paid one, nobody would need to grub their info from their web site. Personally I don't care how people use the IMDB::Film. This is just a code. Nothing more.

    ---
    Michael Stepanov aka nite_man

    It's only my opinion and it doesn't have pretensions of absoluteness!

      nite_man,
      I believe it is irresponsible not to even mention in the documentation that by using your module, a user will be violating the TOS which in turn may be breaking the law. I found this article which seemed enlightening. There are also a number of sites that are providing a web API such as this one. It is unclear to me if they are downloading the text files that IMDB makes available or if they are in turn scraping IMDB. For the record, IMDB does provide a web service API for a fee (minimum of 15,000 USD) for commercial purposes and also indicates how to obtain written permission for personal screen scraping in their terms of service.

      Cheers - L~R

Re^2: help a Dutchman with hash
by nrbrtkls (Initiate) on May 30, 2011 at 06:48 UTC

    To all who have responded to a monk in distress: Thank you very much! I will keep an eye out and see if there are others that I can help in return for your kindness. Thanks again for all your effort.