Re: Question regarding web scraping

An alternative would be to request a JSON object rather than the rendered web page, do this by appending .json to the end of your URL like so:

https://www.reddit.com/r/unitedkingdom/comments/58m2hs/i_danie+l_blake
+_is_released_today/.json
[download]

I've no idea what tool you're using further down the line for analysis, but HTML seems like a odd format to store such data. Here is a short example, simply printing the name of the poster and the comment:

#!/usr/bin/perl

use strict;
use warnings;
use Mojo::UserAgent;

my $url ='https://www.reddit.com/r/unitedkingdom/comments/58m2hs/i_dan
+ie+l_blake_is_released_today/.json';

my $ua = Mojo::UserAgent->new;
my $data = $ua->get( $url )->res->json;

foreach my $comment ( @{$data} ) {
    foreach my $child ( @{ $comment->{'data'}->{'children'} } ) {
        print $child->{'data'}->{'author'} . " posted:" .$/;
        print $child->{'data'}->{'body'} . "\n" if( $child->{'data'}->
+{'body'} );
    }
}
[download]

You'll need the Mojo::UserAgent module:

#install via cpan
cpan Mojo::UserAgent
#or cpanm
cpanm Mojo::UserAgent
[download]

From the brief example above you can see how to get just what you want, or add some other bells and whistles. The example isn't particulary pretty in it's output, I'll leave that an an exercise for you. You can examine the JSON in browser (some plugins exist to prettify the content) or you can use something like json_pp to print it from the command line.

Update: So I read some other comments you made, if you're trying to do this for various sub-reddits you can easily adapt the above example to:

For each sub reddit url (append .json)
Get each thread
Follow the existing code to print comments (or save to a file)
sleep for a few seconds...

Comment on Re: Question regarding web scraping Select or Download Code