Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks,
I thought I would try something easy, namely using perl to download a file, and I have yet to achieve it. The page in question is found on google's public download site, so there isn't a question of permissions: useful utilities.
I've tried it a few different ways and even another syntax, and what I get with the download is the html for the page itself, which I won't show, but it definitely is the same as when you go to the above site and select "view page source."
use LWP::Simple; my $url = 'https://code.google.com/archive/p/dotnetperls-controls/down +loads/enable1.txt'; my $file = 'a.txt'; getstore($url, $file);
Fishing for tips,
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: downloading a file on a page with javascript
by choroba (Cardinal) on Mar 30, 2020 at 21:46 UTC | |
Using this URL instead of the one you used also stores a list of words to the output file, which I guess is the output you had expected. Getting this URL from the Archive page without JavaScript is hard. Search the Monastery for related questions.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
by Aldebaran (Curate) on Apr 06, 2020 at 22:38 UTC | |
I cobbled it together together from the base url and the file I wanted. If I point my mouse on the file and save the link,I get the same thing. What I realize from your and bliako's post is that I underused the power of the browser to figure this out. Using this URL instead of the one you used also stores a list of words to the output file, which I guess is the output you had expected.Thx, choroba, that is indeed what I seek for my wordgames. With the correct url, my script gets the english dictionary. I decided to try it out with an older source post of yours: Re^7: Words in Words. "Correct" entries are words that have a properly-encompassing word. A hybrid is this: Source:
Logophiles like me play gladly with such output. I speak english natively, so I'm rarely challenged with english vocabulary. The resulting list is fascinating:
Who knew that there were 4 different consciouses? I couldn't find an example that failed to have a larger including word. Anyways, thanks for your comment that got me on the right track and also for the fun of replicating your "words within words" script. "Perl scripting: great for pandemics...." | [reply] [d/l] [select] |
|
Re: downloading a file on a page with javascript
by bliako (Abbot) on Mar 30, 2020 at 22:21 UTC | |
there are at least two ways to approach this. The first is to use WWW::Mechanize::Chrome which is like running a browser but without the gui (headless) from inside your script. With it you will be able to dive into the fetched page's DOM and extract anything you like from it, including those divs that you don't see with a view-page-source because they are fetched later via javascript/ajax. The second is to open the site with your browser, open the developer tools (firefox, but also other will have similar functionality). Go to the network tab, select XHR and reload the page. You will see all the data fetched via ajax. And you will see where does that data come from, it comes from urls just like the one you tried to download. Copy that url as CURL (its on the right-click menu somewhere) and you can see exactly what the url is, what its parameters are. Now, note the url, its parameters and whether it is a POST or a GET and what request-headers it has. It's easy to translate those into LWP::UserAgent. Edit: converting a beast of a CURL commandline to LWP::UserAgent can be done easily by using Corion's curl2lwp (see http://blogs.perl.org/users/max_maischein/2018/11/curl2lwp---convert-curl-command-line-arguments-to-lwp-mechanize-perl-code.html) | [reply] |
by Aldebaran (Curate) on Apr 06, 2020 at 22:33 UTC | |
I was particularly pleased to see this response from bliako, whose pm posts are at a level where I can, about half the time, stretch my game to replicate, understand, and incorporate into "my game," whatever that is. I was thinking there should be several ways that perl could do either natively, or by wrapping C, or with modules. Getting the url right needs to be a part of any solution. The first is to use WWW::Mechanize::ChromeI had trouble installing WWW::Mechanize::Chrome, but it was all of the variety where I needed only to make better web searches for prereq's. The first "problem" was getting WWW::Mechanize::Chrome to install on ubuntu. I lacked 2 things at the beginning: a chrome executable, and headers for png.h . For ubuntu, a good command line install for chrome is here. Since being able to save a screenshot as a png is necessary, I also needed: sudo apt-get install libpng-devThis is as far as I got along this prong. Output, then source:
Aspects of downloads are yet to be implemented according to the 35:06 mark here: corion's presentation from 2017 Q1) How do I brook the gap from $mech->follow_link to populating @words ? The second is to open the site with your browser, open the developer tools (firefox, but also other will have similar functionality). Go to the network tab, select XHR and reload the page. You will see all the data fetched via ajax. And you will see where does that data come from, it comes from urls just like the one you tried to download. Copy that url as CURL (its on the right-click menu somewhere) and you can see exactly what the url is, what its parameters are. Now, note the url, its parameters and whether it is a POST or a GET and what request-headers it has. It's easy to translate those into LWP::UserAgent.I did something close to this dozens of different ways. What ended up working for me was left-clicking on the link while the developer tools--including network tab--are on and then finding the copy to curl on the right click menu as one hovers over it in the tools. This yields:
Then I turned to Corion's curl2lwp converter. I'm super pleased by this:
This represents a huge learning curve partially-ascended for me, including considering the Bigger picture with introduction to DOM. I have one more question at this point, regarding the practice scripts at examples, all of which use Log::Log4perl. If I have:
, and this successfully logs events and errors:
Q2) How do I log using this scheme? For example, do I go from
to:
Again, thanks all for comments, which seem to be the "service work" that most of us can do in these unusual times of "social distancing." Stay healthy! 2020-04-07 Athanasius fixed formatting of over-long code line. | [reply] [d/l] [select] |