Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

how do I scrape this web page

by Anonymous Monk
on Mar 04, 2020 at 18:58 UTC ( [id://11113780]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: how do I scrape this web page
by Your Mother (Archbishop) on Mar 04, 2020 at 19:07 UTC

    Ensure you aren’t breaking the Terms of Service for the site, then show the code you wrote to try to do it and we’ll help you with it. Suggested modules would chiefly include WWW::Mechanize and Mojo::UserAgent. Or post it as a job on https://jobs.perl.org/ and pay someone to do it for you.

Re: how do I scrape this web page
by Marshall (Canon) on Mar 05, 2020 at 00:13 UTC
    I don't see any scenario that results in a "little command line script" to do what you want.
    I've only written a few web scrapers, but I have some advice for you. I think you are into a pretty complicated situation. I looked at the page source of the web page you specified. What I see is a whole mess of computer generated java script code. The number you want, the price of gold isn't on that page. The browser runs some java code on that page in order to get the price of gold. Perl cannot run java code. Only browsers can run java code. There is a module WWW::Mechanize::Chrome which would allow Perl to control a Chrome browser. But this gets complicated.
    Updated: I made a mistake. The price does indeed appear be within the HTML generated. that means that Java Script is not necessary to see the actual price. Look at the page source yourself to see how I could have missed it! See below posts. Also see another post suggesting a google search for API's.

    My advice is to look somewhere else on the Web for the information you desire. Ideally one of the commodity markets has an API to get what you want without involving a browser at all. I am basically assuming that there has got to be a better way to get gold/silver prices than just this one website. I would give you this same advice even if you were paying me to do this for you. Go look some more for other ways. Good luck!

      FWIW, the prices are in the raw HTML, no JS needed. It’s just buried in noise. Raw text stripped out a little–

      Gold Price Today Gold Price $1,638.17
        Wow, then my mistake. I stand corrected. Thanks! I just saw so much JS noise and I searched on "638" in an attempt to find the actual price without a result, but that probably means I made a search mistake. Ooops! This does make the job quite a bit easier. This then now brings back up the "terms of use" and whether using this page is "legal" for the described usage. That I don't know.

        I found this below: Geez, this page's code is a mess!:

        <div class="metal-title"> Gold Price </div> <div class="nfprice">&#36;1,638.93</div> <div class="table-variations"> <div class="single-variation-currency">
        Update: I followed my own advice and googled "commodity api data". There appear to be lots of options. I haven't investigated nor do I endorse any of the sources. But one says "XXXX offers commodity prices data for almost 100 commodities, including gold prices, silver prices and oil prices from multiple sources. XXXX's simple API gives access to daily spot prices and historical commodity prices. That or other similar sounds promising.

        The API for XXXX says a free user gets: "Authenticated users have a limit of 300 calls per 10 seconds, 2,000 calls per 10 minutes and a limit of 50,000 calls per day." Pay for users can go faster. This is much better than fiddling around with web page with fancy graphics. The data is returned in a format that is easy for computers to understand. Well geez as it should be if the "throttle" on a free account is an average of 30 requests per second!

        Additional Update https://blog.quandl.com/getting-started-with-the-quandl-api This shows how to get the data you want in JSON or CSV files. The way to use Perl is to get this JSON data and do what you want with it. Look at https://docs.quandl.com/docs/in-depth-usage for some examples. Scraping a user web page is not the right way to get this info. Get the right API for the data that you need and then use Perl to just go crazy with this JSON, CSV or HTML data. Although Your Mother found the HTML representation of Gold Price on this initial page and yes parsing this page can get that number, it is not the "right way". Using an API to get the data you want is the "right way" and these API's are designed to be very performant. I mean geez, this API is designed so that you can hit it 50k times per day without even paying anything! If you need this data more often than that, you are into something much more advanced than your question indicates!

Re: Trying to scrape webpage
by Discipulus (Canon) on Mar 05, 2020 at 08:55 UTC
    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11113780]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-18 14:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found