mikorym has asked for the wisdom of the Perl Monks concerning the following question:

The question is perhaps too general, but let my try to be specific. Suppose I extract from a URL the website content such as:

my $response = HTTP::Tiny->new->get($url);

Suppose further that for whatever reason this is a malicious URL. What is the internal workings of HTTP:Tiny and would some of the more common malware that one encounters exhibit their malicious activity through such a scraping command?

My intuition tells me that if targeting scrapers specifically, there would be a risk. For other types of malware that execute in browsers, would the script return the source without triggering the malicious code in the website?

Replies are listed 'Best First'.
Re: Malware on Webpages Visited by Crawlers
by LanX (Saint) on Mar 19, 2019 at 10:23 UTC
    > What is the internal workings of HTTP:Tiny

    I looked at HTTP::Tiny and it doesn't seem to execute JS.

    > My intuition tells me that if targeting scrapers specifically, there would be a risk.

    Yes you are right, if a software has any kind of potential to execute injected code² an attacker could try to target it.

    I strongly doubt that this is the case here, it should be as safe as storing the HTML on disk.°

    Unless of course it contains the satanic bible encoded in reversed UTF666 ...

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    °) well ... maybe it's possible to run a DOS attack with clever circular redirections, but this page would be a time trap for every browser. And this could be countered with a timeout mechanism.

    ²) you could try to parse the code and all dependencies and investigate all string eval statements.