Dear Monks

I have a perl script to download data from HTTPS site. I was using Crypt::SSLeay. My script is working fine, I could properly download full data (csv file) from the server.

I thought of give a try with LWP's inbuilt IO::Socket::SSL.

Actually I am using WWW::Mechanize in my script, Script failed in  $mech->response()->decoded_content() phase. I tried to debug more and I found that it could not deflate the gzip compress data sent from server.

Surprised. I thought to debug more and disabled the compression using  $mech->add_header('Accept-Encoding' => '');

Now, I could see the data comes from the server but its not complete data, i see only first few bytes. I examine the HTTP::Response headers, I find

'client-transfer-encoding' => [ 'chunked' ]


Looks like the server is sending the chunked data to me. LWP / IO::Socket::SSL could not work with "chunked" data transfer. So gzip content decode fails.

when I force to use Crypt::SSLeay like below,

use Crypt::SSLeay; use Net::SSL; use WWW::Mechanize; .... $ENV{PERL_NET_HTTPS_SSL_SOCKET_CLASS} = "Net::SSL"; $mech = WWW::Mechanize->new(autocheck =>1, noproxy =>1,ssl_opts => { ' +verify_hostname' => 0 }); ...


I see full data comes to me from server. I still see "chunked" header but its properly handled by Net::SSL / Crypt::SSleay .

Q: Does any one face this issue? Perl LWP Can handle "Chunked" data transfer over SSL?. Thanks for your time.

Update: Added 2 test scripts to demonstrate the problem

One uses the Net::SSL and downloads data properly from Server
Other uses IO::Socket::SSL and downloads only first chunk (I think) from server and quits.

To show the differences b/w downloads, I have shown MD5 sum and file sizes.

My environment
OS: Windows 7 , x86_64 bit Perl: Active Perl , perl 5, version 20, subversion 1 (v5.20.1) built for MSWin32-x86-multi-thread-64int

Note: I saw the same behavior in Active Perl 5.10, 5.14, 5.16 and 5.18

Script 1 - Using Net::SSL and Crypt::SSLeay - Working

#WORKING HTTPS DOWNLOAD Using Net::SSL in Windows + Active Perl use strict; use warnings; use Crypt::SSLeay; use Net::SSL; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Message; use Digest::MD5; use File::Slurp; use Data::Dumper; #Globals $|=1; #Force LWP to use Net::SSL instead of IO::Socket::SSL $ENV{PERL_NET_HTTPS_SSL_SOCKET_CLASS} = "Net::SSL"; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; delete $ENV{https_proxy} if exists $ENV{https_proxy}; delete $ENV{http_proxy} if exists $ENV{http_proxy}; #Variables my $browser = ""; my $url = 'https://developer.apple.com/standards/qtff-2001.pdf'; my $pageContent = ''; my $fileName = ''; my $md5Obj = Digest::MD5->new(); print "\n USING Net::SSL"; #Init Mechanize $browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1, ssl_opts => +{ 'verify_hostname' => 0 }); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias( 'Linux Mozilla'); $browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl +e()); $browser->timeout(120); #Get URL $browser->get($url); if ($browser->success()) { print "\n INFO: Got URL: $url"; $fileName = $browser->response()->filename(); print "\n INFO: Save in File: $fileName"; $browser->save_content($fileName); #Calculate MD5 sum $pageContent = read_file( $fileName, binmode => ':raw' ); print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB" +; $md5Obj->add($pageContent); print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest(); undef $md5Obj; } else { print "\n ERROR: Can't get URL $url ",$browser->status(); } print "\n\n INFO: ********************* DUMP ********************"; print "\n",Dumper(\$browser); print "\n INFO: ********************* DUMP ********************"; exit 0;

Output1:


  USING Net::SSL
 INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf
 INFO: Save in File: qtff-2001.pdf
 INFO: qtff-2001.pdf Size: 5465.48046875 KB
 INFO: qtff-2001.pdf MD5 Sum: d1aee95cc06d529e67b707257a5cf3eb

Script 2 - Using IO::Socket::SSL - Not Working. Only part of the PDF file is downloaded

#NOT WORKING HTTPS DOWNLOAD Using IO::Socket::SSL in Windows + Active +Perl use strict; use warnings; use IO::Socket::SSL; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Message; use Digest::MD5; use File::Slurp; use Data::Dumper; #Globals $|=1; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; #Variables my $browser = ""; my $url = 'https://developer.apple.com/standards/qtff-2001.pdf'; my $pageContent = ''; my $fileName = ''; my $md5Obj = Digest::MD5->new(); print "\n USING IO::Socket::SSL"; #Init Mechanize $browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1,ssl_opts => { + 'verify_hostname' => 0 }); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias( 'Linux Mozilla'); $browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl +e()); $browser->timeout(120); #Get URL $browser->get($url); if ($browser->success()) { print "\n INFO: Got URL: $url"; $fileName = $browser->response()->filename(); print "\n INFO: Save in File: $fileName"; $browser->save_content($fileName); #Calculate MD5 sum $pageContent = read_file( $fileName, binmode => ':raw' ); print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB" +; $md5Obj->add($pageContent); print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest(); undef $md5Obj; } else { print "\n ERROR: Can't get URL $url ",$browser->status(); } print "\n\n INFO: ********************* DUMP ********************"; print "\n",Dumper(\$browser); print "\n INFO: ********************* DUMP ********************"; exit 0;

Output2:


  USING IO::Socket::SSL
 INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf
 INFO: Save in File: qtff-2001.pdf
 INFO: qtff-2001.pdf Size: 6.66796875 KB
 INFO: qtff-2001.pdf MD5 Sum: 4049c364f7332790c3abe548d6a4297c

I did not paste the Dumper output because its huge and not properly copied to browser because of the binary contents.

Please help me to understand why scripts behave differently? I was thinking, its chunking issues ...

Thanks & Regards,
Bakkiaraj M
My Perl Gtk2 technology demo project - http://code.google.com/p/saaral-soft-search-spider/ , contributions are welcome.


In reply to Perl LWP Can handle client-transfer-encoding = chunked encoding? by sam_bakki

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.