in reply to Question regarding web scraping

The following is a malformed regular expression:

while ($CONTENT =~ <div class=\"usertext-body may-blank-within md-cont +ainer \"><div class=\"md\">(.+?)<\/div><\/div><\/form><ul class=\"fla +t-list buttons\"> //gs )

It is at least missing the s/ start.

Personally, I suggest that you do the content extraction by using HTML::TreeBuilder and XPath or CSS selectors (via HTML::TreeBuilder::XPath and HTML::Selector::CSS).

Also note that Reddit has an API available, so you maybe don't need to scrape at all but can get the comments in a machine readable format directly.

Also note that on CPAN, there are many Reddit modules available, and it seems that Reddit::Client is using the Reddit API.

Replies are listed 'Best First'.
Re^2: Question regarding web scraping
by Gangabass (Vicar) on Oct 23, 2016 at 05:42 UTC
Re^2: Question regarding web scraping
by Lisa1993 (Acolyte) on Oct 22, 2016 at 15:30 UTC
    Thank you very much! I will look into these alternatives. Thanks again for your suggestions.