Malware on Webpages Visited by Crawlers

mikorym has asked for the wisdom of the Perl Monks concerning the following question:

The question is perhaps too general, but let my try to be specific. Suppose I extract from a URL the website content such as:

my $response = HTTP::Tiny->new->get($url);
[download]

Suppose further that for whatever reason this is a malicious URL. What is the internal workings of HTTP:Tiny and would some of the more common malware that one encounters exhibit their malicious activity through such a scraping command?

My intuition tells me that if targeting scrapers specifically, there would be a risk. For other types of malware that execute in browsers, would the script return the source without triggering the malicious code in the website?

Comment on Malware on Webpages Visited by Crawlers Download Code

Replies are listed 'Best First'.
Re: Malware on Webpages Visited by Crawlers by LanX (Saint) on Mar 19, 2019 at 10:23 UTC
> What is the internal workings of HTTP:Tiny I looked at HTTP::Tiny and it doesn't seem to execute JS. > My intuition tells me that if targeting scrapers specifically, there would be a risk. Yes you are right, if a software has any kind of potential to execute injected code² an attacker could try to target it. I strongly doubt that this is the case here, it should be as safe as storing the HTML on disk.° Unless of course it contains the satanic bible encoded in reversed UTF666 ... Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice} °) well ... maybe it's possible to run a DOS attack with clever circular redirections, but this page would be a time trap for every browser. And this could be countered with a timeout mechanism. ²) you could try to parse the code and all dependencies and investigate all string `eval` statements.	[reply]

Replies are listed 'Best First'.

Re: Malware on Webpages Visited by Crawlers
by LanX (Saint) on Mar 19, 2019 at 10:23 UTC

> What is the internal workings of HTTP:Tiny

I looked at HTTP::Tiny and it doesn't seem to execute JS.

> My intuition tells me that if targeting scrapers specifically, there would be a risk.

Yes you are right, if a software has any kind of potential to execute injected code² an attacker could try to target it.

I strongly doubt that this is the case here, it should be as safe as storing the HTML on disk.°

Unless of course it contains the satanic bible encoded in reversed UTF666 ...

Cheers Rolf
_{(addicted to the Perl Programming Language :)

Wikisyntax for the Monastery
FootballPerl is like chess, only without the dice}

°) well ... maybe it's possible to run a DOS attack with clever circular redirections, but this page would be a time trap for every browser. And this could be countered with a timeout mechanism.

²) you could try to parse the code and all dependencies and investigate all string eval statements.

[reply]