Code for 'Save html page' that contains dynamic content?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! I have been looking around the Internet for a long time now, trying to find a code that I can use to (bulk) download some web pages, based on their URL. Unfortunately wget does not work, because many of the elements are JS or something and they don't get downloaded. Eseentially I need something that resembles the 'Save page as' functionality, but without me doing it.
Something like that, which, for some reason does not work for me (does not save the pages as it shows on the demo):

https://github.com/abiyani/automate-save-page-as
[download]

Example webpages that I am trying to download (>1000 of them):

https://opm.phar.umich.edu/proteins/7839
https://opm.phar.umich.edu/proteins/4676
[download]

where I change the numbers in the end.
Does any of you have some code that they might have used for a similar task, or some pointers at least on where to begin? Any help/advice would be very welcomed!

Comment on Code for 'Save html page' that contains dynamic content? Select or Download Code

Replies are listed 'Best First'.
Re: Code for 'Save html page' that contains dynamic content? by marto (Cardinal) on Jun 28, 2023 at 08:07 UTC
While not a direct answer to your request for bulk downloads, are you aware of BioPerl? Introduction:"If you’re a molecular biologist it’s likely that you’re interested in gene and protein sequences, and you study them in some way on a regular basis. Perhaps you’d like to try your hand at automating some of these tasks, or you’re just curious about learning more about the programming side of bioinformatics. In this HOWTO you’ll see discussions of some of the common uses of Bioperl, like sequence analysis with BLAST and retrieving sequences from public databases. You’ll also see how to write Bioperl scripts that chain these tasks together, that’s how you’ll be able to do really powerful things with Bioperl.". Perhaps worth investigating, while not a bioinformatician I know of people who have used BioPerl to work with proteins and such like. I mention this as if the data is available this way it may be a better option than having to write the code to parse various sites/pages to get the data you want.	[reply]
Re: Code for 'Save html page' that contains dynamic content? by tobyink (Canon) on Jun 28, 2023 at 07:11 UTC
Do you actually want the formatted pages or do you want to extract data from them? If you want to extract data from them, check out: https://lomize-group-opm.herokuapp.com/primary_structures/7839 https://lomize-group-opm.herokuapp.com/primary_structures/4676 This is the JSON data which the Javascript on those pages is fetching and displaying. You can use JSON::PP to parse the JSON and do what you want with the data. Hire me at Toby Ink Ltd or Join my OnlyFans	[reply]
Re^2: Code for 'Save html page' that contains dynamic content? by bliako (Abbot) on Jun 28, 2023 at 22:25 UTC
yes, the API is at https://opm.phar.umich.edu/download. If the OP is serious about this they should start from there and build the crawler. That said, BioPerl was already mentioned and AFAIR R has packages for downloading some types of bio-data (not sure what).	[reply]
Re^2: Code for 'Save html page' that contains dynamic content? by Anonymous Monk on Jun 28, 2023 at 07:27 UTC
Thanks! I had not seen that. But this was just an example, because I have other ones as well, like: `http://pdbtm.enzim.hu/?_=/pdbtm/1a0t` [download] where I want to check the coloured letters. In any case, if you have any suggestions (or code) that can be used for such tasks, it would be great :)	[reply] [d/l]
Re^3: Code for 'Save html page' that contains dynamic content? by tobyink (Canon) on Jun 28, 2023 at 08:49 UTC
In that case, you probably want to drive a full-featured browser via scripting instead. WWW::Mechanize::Chrome is probably a good place to start. Hire me at Toby Ink Ltd or Join my OnlyFans	[reply]
Re: Code for 'Save html page' that contains dynamic content? by cavac (Prior) on Jun 28, 2023 at 12:37 UTC
Click on the "Contact Us" link and ask them nicely if they can provide you a download link for the whole dataset. There's a chance nobody has asked them until now... PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP	[reply]