in reply to Question regarding web scraping
The following is a malformed regular expression:
while ($CONTENT =~ <div class=\"usertext-body may-blank-within md-cont +ainer \"><div class=\"md\">(.+?)<\/div><\/div><\/form><ul class=\"fla +t-list buttons\"> //gs )
It is at least missing the s/ start.
Personally, I suggest that you do the content extraction by using HTML::TreeBuilder and XPath or CSS selectors (via HTML::TreeBuilder::XPath and HTML::Selector::CSS).
Also note that Reddit has an API available, so you maybe don't need to scrape at all but can get the comments in a machine readable format directly.
Also note that on CPAN, there are many Reddit modules available, and it seems that Reddit::Client is using the Reddit API.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Question regarding web scraping
by Gangabass (Vicar) on Oct 23, 2016 at 05:42 UTC | |
|
Re^2: Question regarding web scraping
by Lisa1993 (Acolyte) on Oct 22, 2016 at 15:30 UTC |