Exporting Curl content to html

aelmore has asked for the wisdom of the Perl Monks concerning the following question:

HI all: I've inherited this project and haven't used Perl or CGI in a decade, and even then my knowledge was rudimentary.

This script is supposed to run two CURL commands:

1) First, pull all page ID's and comma separate them.

2) Take ID array, feed them into $id, run curl command to pull page content, and then exports them into an individual HTML file.

It's doing the first part, but only dumping the page ids into the html files. Where is it breaking?

The code with variables redacted:

my $base_url = 'https://
my $user =
my $pass = ;
#folder change
my $out_dir = 'pages';
#format change
my $format = 'html';


#pulls all ID's
my $out = `curl -u $user:$pass -i $base_url/file/root/tree?format=ids`
+;

#regex saying any number with a comma after it
$out =~ /.+\s*([\d,]+)$/;

#separate each id by comma
my @ids = split /,/, $1;

#pull pageid array, create separate HTML file with page contents
foreach my $id (@ids) {
    print "$id\n";

    #passes array into $id and makes curl command to get HTML content.
    my $json = `curl -u $user:$pass -i $base_url/files/$id/contents?fo
+rmat=$format`;

    $json =~ /^.+\r\n\r\n:?(.+)$/s;
    my $contents = $1;

    open(FILE, ">$out_dir/$id.$format") or die "can't open file for $i
+d: $!";
    print FILE "$contents\n";
    close FILE;
}
[download]

I'm sure it's an easy fix, but my Perl skills are lacking. Thanks in advance.

Comment on Exporting Curl content to html Download Code

Replies are listed 'Best First'.
Re: Exporting Curl content to html by choroba (Cardinal) on Jan 26, 2016 at 16:08 UTC
Hi aelmore, welcome to the Monastery! Without seeing the input, we can only guess. You can try adding `print "<$json>\n";` [download] after line 17 to check the input is what you expect (aka #2 at Basic debugging checklist). ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re^2: Exporting Curl content to html by aelmore (Initiate) on Jan 26, 2016 at 18:25 UTC
I've added the input variables to the script but removed their values since this is for private work. BTW, in case this may be part of the issue. I'm running actionperl using command line in Win7.<7>	[reply]
Re: Exporting Curl content to html by aelmore (Initiate) on Jan 26, 2016 at 18:37 UTC
UPDATE: This may not be a coding issue. I spoke to the original author and he ran it 'as is' on his console and the expected results. Perl version: perl 5, version 20, subversion 2 (v5.20.2) built for MSWin32-x64-multi-t Here's what I downloaded to get perl to run on Win7 1. install perl http://www.activestate.com/activeperl?gclid=CP6J0LzNx8oCFQovHwod280O_A 2. install c== redistrubtable https://www.microsoft.com/en-us/download/details.aspx?id=48145 3. Install curl http://curl.haxx.se/download.html So am I running it in the right environment?	[reply]
Re^2: Exporting Curl content to html by poj (Abbot) on Jan 26, 2016 at 20:47 UTC
Did the author run it on Unix ?. I'm guessing the line `$json =~ /^.+\r\n\r\n:?(.+)$/s;` is removing the header but on windows should be `\n\n`. However, if you remove the `-i` the curl download doesn't include the header. Using `-o` you can write the result to a file. Try this revised script (untested). #!perl use strict; my $base_url= 'https://'; my $user = ''; my $pass = ''; my $out_dir = 'pages'; my $format = 'html'; my $out= `curl -u $user:$pass -i $base_url/file/root/tree?format=ids`; if ($out =~ /.+\s*([\d,]+)$/){ my @ids = split /,/, $1; foreach my $id (@ids) { print "$id\n"; my $cmd = "curl -u $user:$pass $base_url/files/$id/contents?format +=$format -o $out_dir/$id.$format"; my $status = system($cmd); if ($status) { die "system error: $?" } } } else { print "Error - No ids found"; } [download] poj	[reply] [d/l] [select]
Re: Exporting Curl content to html by Anonymous Monk on Jan 26, 2016 at 19:42 UTC
`my $contents = $1;` [download] does `$contents` have anything? anyway, try to put `use open IO => ":raw";` [download] on top of that thing.	[reply] [d/l] [select]