in reply to Re^3: Trouble with some of IDDB Public Methods
in thread Trouble with some of IMDB Public Methods
OP seems to have found what he wanted, so I thought I might use the opportunity to ask marto (or anyone else who can bake from scratch with mojo) to further explore the script he posted in Re^5: polishing up a json fetching script for weather data. It might be an improvement to a script that marto characterized as sub optimal. I certainly hope that we don't optimize away the comments and break up the logic as opposed to having just a train of arrows that online sources may have, with words whose provenance is unknown, like top in this example:
# JSON POST (application/json) with TLS certificate authentication my $tx = $ua->cert('tls.crt')->key('tls.key')->post('https://example.c +om' => json => {top => 'secret'});
or json, there's nothing that makes keywords stand out, and where does one go to determine their provenance? How exactly are you going to disambiguate 'json'? The above came from link to Mojo/UserAgent. I understand that examples are selected for brevity. I would love to see a cache of them with many authors.
It seemed to me that having to hardcode the movie title like this was an area that can be improved.
my $imdburl = 'http://www.imdb.com/search/title?title=Caddyshack';I couldn't get titles with multiple words to work at all. The search replaces spaces with plusses in the url, but interpolation with a lexical variable is just beneath mojo, even if it worked, which it doesn't. What I want is a script that shows me what's at this site from a mojo point of view, and this does so naively:
#!/usr/bin/perl use strict; use warnings; use Mojo::URL; use Mojo::Util qw(dumper); use Mojo::UserAgent; use Data::Dump; use Log::Log4perl; use 5.016; use Mojo::DOM; my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf"; my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf"; #Log::Log4perl::init($log_conf3); #debug Log::Log4perl::init($log_conf4); #info my $logger = Log::Log4perl->get_logger(); $logger->info("$0"); # pretend to be a browser my $uaname = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like G +ecko) Chrome/40.0.2214.93 Safari/537.36'; my $ua = Mojo::UserAgent->new; $ua->max_redirects(5)->connect_timeout(20)->request_timeout(20); $ua->transactor->name($uaname); my $first_title = 'Virgin+River'; my $imdburl = "http://www.imdb.com/search/title?title=$first_title"; say "imdburl is $imdburl"; # find search results my $dom = $ua->get($imdburl)->res->dom; my @nodes = @$dom; # c-style for is good for array output with index for ( my $i = 0 ; $i < @nodes ; $i++ ) { $logger->info("i is $i =============="); $logger->info("$nodes[$i]"); } sleep 2; #good hygiene __END__
What does it show?
2020/12/31 13:53:39 INFO i is 1 ============== 2020/12/31 13:53:39 INFO <!DOCTYPE html> 2020/12/31 13:53:39 INFO i is 2 ============== 2020/12/31 13:53:39 INFO
First looks right...second is empty...
The 3rd contains 61 k of javascript hell. The 4th and ultimate was empty. Javascript isn't meant for human eyes, or let me be specific, I find it illegible, so I used the browser tools to look closer. I realize that I simply don't understand the javascript, and that's not mojo's fault. The browser tools give me this upon inspection and right click inside the search box:
<input type="text" value="" autocomplete="off" aria-autocomplete="list +" aria-controls="react-autowhatever-1" class="imdb-header-search__inp +ut GVtrp0cCs2HZCo7E2L5UU react-autosuggest__input" id="suggestion-sea +rch" name="q" placeholder="Search IMDb" autocapitalize="none" autocor +rect="off"
Then I remembered that you can use mojo to do this instead:
$ mojo get https://www.imdb.com/ '*' attr id >1.txt $ grep search 1.txt navSearch-searchState suggestion-search-container nav-search-form navbar-search-category-select navbar-search-category-select-contents suggestion-search suggestion-search-button imdbHeader-searchClose imdbHeader-searchOpen $
Now I thought I was really in hot pursuit. I thought, "aha, I can find this id and post to it." So I go to find find in Mojo::Dom, and I don't really understand the examples until I can work them myself and see them:
$ ./1.dom.pl ./1.dom.pl 123 Test 123 a b b a a:Test b:123 <p id="a">Test</p><p id="b">123</p><p id="d">789</p><p id="c">456</p> $ cat 1.dom.pl #!/usr/bin/perl use strict; use warnings; use Mojo::URL; use Mojo::Util qw(dumper); use Mojo::UserAgent; use Data::Dump; use Log::Log4perl; use 5.016; use Mojo::DOM; my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf"; my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf"; #Log::Log4perl::init($log_conf3); #debug Log::Log4perl::init($log_conf4); #info my $logger = Log::Log4perl->get_logger(); $logger->info("$0"); # pretend to be a browser my $uaname = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like G +ecko) Chrome/40.0.2214.93 Safari/537.36'; my $ua = Mojo::UserAgent->new; $ua->max_redirects(5)->connect_timeout(20)->request_timeout(20); $ua->transactor->name($uaname); ## example from https://docs.mojolicious.org/Mojo/DOM #use Mojo::DOM; # Parse my $dom = Mojo::DOM->new('<div><p id="a">Test</p><p id="b">123</p></di +v>'); # Find say $dom->at('#b')->text; say $dom->find('p')->map('text')->join("\n"); say $dom->find('[id]')->map( attr => 'id' )->join("\n"); # Iterate $dom->find('p[id]')->reverse->each( sub { say $_->{id} } ); # Loop for my $e ( $dom->find('p[id]')->each ) { say $e->{id}, ':', $e->text; } # Modify $dom->find('div p')->last->append('<p id="c">456</p>'); $dom->at('#c')->prepend( $dom->new_tag( 'p', id => 'd', '789' ) ); $dom->find(':not(p)')->map('strip'); # Render say "$dom"; __END__ $ ./4.dom.pl ./4.dom.pl <h1>Test</h1> bar bar foo baz ===== comment doctype pi text root tag text $ cat 4.dom.pl #!/usr/bin/perl use strict; use warnings; use Mojo::URL; use Mojo::Util qw(dumper); use Mojo::UserAgent; use Data::Dump; use Log::Log4perl; use 5.016; use Mojo::DOM; my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf"; my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf"; #Log::Log4perl::init($log_conf3); #debug Log::Log4perl::init($log_conf4); #info my $logger = Log::Log4perl->get_logger(); $logger->info("$0"); ## examples from https://docs.mojolicious.org/Mojo/DOM my $dom7 = Mojo::DOM->new(); my $str7 = $dom7->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h2')->prev +ious; $logger->info($str7); # "bar" my $dom8 = Mojo::DOM->new(); my $str8 = $dom8->parse("<div>foo<p>bar</p>baz</div>")->at('p')->text; say "$str8"; $logger->info($str8); # "foo\nbaz\n" my $dom9 = Mojo::DOM->new(); my $str9 = $dom9->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')- +>text; $logger->info($str9); $logger->info('====='); my $dom1 = Mojo::DOM->new(); my $str1 = $dom1->parse('<!-- Test -->')->child_nodes->first->type; $logger->info($str1); # "doctype" $str1 = $dom1->parse('<!DOCTYPE html>')->child_nodes->first->type; $logger->info($str1); # "pi" $str1 = $dom1->parse('<?xml version="1.0"?>')->child_nodes->first->typ +e; $logger->info($str1); $str1 = $dom1->parse('<title>Test</title>')->at('title')->child_nodes->first +->type; $logger->info($str1); $str1 = $dom1->parse('<p>Test</p>')->type; $logger->info($str1); $str1 = $dom1->parse('<p>Test</p>')->at('p')->type; $logger->info($str1); $str1 = $dom1->parse('<p>Test</p>')->at('p')->child_nodes->first->type +; $logger->info($str1); __END__ $
Finally, I got a usage for find that worked:
$ ./2.dom.pl ./2.dom.pl ads_tarnhelm ads_doWithAds ads_monitoring_setup ads_safeframe_setup ad +s_general_setup IMDbHomepageSiteReactViews imdbHeader nblogin imdbHea +der-navDrawerOpen imdbHeader-navDrawerOpen--desktop imdbHeader-navDra +wer nav-link-categories-mov nav-link-categories-tvshows nav-link-cate +gories-video nav-link-categories-awards nav-link-categories-celebs na +v-link-categories-comm home_img_holder home_img navSearch-searchState + suggestion-search-container nav-search-form navbar-search-category-s +elect navbar-search-category-select-contents suggestion-search sugges +tion-search-button imdbHeader-searchClose imdbHeader-searchOpen ipc-s +vg-gradient-tv-logo-t ipc-svg-gradient-tv-logo-v ipc-wrap-background- +id inline20_wrapper placeholderPattern b a b a b a b a b a b a b a in +line40_wrapper placeholderPattern from-your-watchlist fan-picks tecon +sent ftr__a ftr__c ftr__e ftr__g ftr__i ftr__k ftr__m ftr__o ftr__q f +tr__s ftr__u ftr__w ftr__y ftr__A ftr__C ftr__E ftr__G ftr__b ftr__d +ftr__f ftr__h ftr__j ftr__l ftr__n ftr__p ftr__r ftr__t ftr__v ftr__x + ftr__z ftr__B ftr__D ftr__F ftr__H ipc-svg-gradient-tv-logo-t ipc-sv +g-gradient-tv-logo-v ipc-svg-gradient-tv-logo-t ipc-svg-gradient-tv-l +ogo-v be $ cat 2.dom.pl #!/usr/bin/perl use strict; use warnings; use Log::Log4perl; use 5.016; use Mojo::DOM; use Mojo::UserAgent; my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf"; my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf"; #Log::Log4perl::init($log_conf3); #debug Log::Log4perl::init($log_conf4); #info my $logger = Log::Log4perl->get_logger(); $logger->info("$0"); # represent $0 as browser to server my $uaname = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like G +ecko) Chrome/40.0.2214.93 Safari/537.36'; my $ua = Mojo::UserAgent->new; $ua->max_redirects(5)->connect_timeout(20)->request_timeout(20); $ua->transactor->name($uaname); ## main page of imdb contains search box my $imdburl = "http://www.imdb.com/"; ## example from https://docs.mojolicious.org/Mojo/DOM my $dom = $ua->get($imdburl)->res->dom; # say "$dom"; works # my @ids= $dom->find('[id]')->map(attr => 'id')->each; $logger->info("@ids"); __END__ $
Anyways, this was my final push and I seem to come up short:
$ ./2.1.dom.pl ./2.1.dom.pl navSearch-searchState suggestion-search-container nav-search-form navb +ar-search-category-select navbar-search-category-select-contents sugg +estion-search suggestion-search-button imdbHeader-searchClose imdbHea +der-searchOpen Can't locate object method "find" via package "Mojo::UserAgent" at ./2 +.1.dom.pl line 48. $ cat 2.1.dom.pl #!/usr/bin/perl use strict; use warnings; use Log::Log4perl; use 5.016; use Mojo::DOM; use Mojo::UserAgent; use Mojo::URL; use Mojo::Util qw(trim); my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf"; my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf"; #Log::Log4perl::init($log_conf3); #debug Log::Log4perl::init($log_conf4); #info my $logger = Log::Log4perl->get_logger(); $logger->info("$0"); # represent $0 as browser to server my $uaname = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like G +ecko) Chrome/40.0.2214.93 Safari/537.36'; my $ua = Mojo::UserAgent->new; $ua->max_redirects(5)->connect_timeout(20)->request_timeout(20); $ua->transactor->name($uaname); ## main page of imdb contains search box my $imdburl = "http://www.imdb.com/"; ## example from https://docs.mojolicious.org/Mojo/DOM my $dom = $ua->get($imdburl)->res->dom; # say "$dom"; works # my @ids = $dom->find('[id]')->map( attr => 'id' )->each; #$logger->info("@ids"); my @matches = grep { /search/ } @ids; $logger->info("@matches"); my $vid = 'Virgin River'; $ua->post( $imdburl => form => { 'suggestion-search' => $vid } ); # assume first match my $filmurl = $ua->find('a[href^=/title]')->first->attr('href'); # extract film id my $filmid = Mojo::URL->new($filmurl)->path->parts->[-1]; # get details of film $dom = $ua->get("https://www.imdb.com/title/$filmid/")->res->dom; # print film details say trim( $dom->at('div.title_wrapper > h1')->text ) . ' (' . trim( $dom->at('#titleYear > a')->text ) . ')'; # print actor/character names foreach my $cast ( $dom->find('table.cast_list > tr:not(:first-child)' +)->each ) { say trim ( $cast->at('td:nth-of-type(2) > a')->text ) . ' as ' . trim( $cast->at('td.character')->all_text ); } __END__ $
These are resources I drew from:
Thanks for comments,
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Trouble with some of IDDB Public Methods
by marto (Cardinal) on Jan 01, 2021 at 09:54 UTC |