Beavis has asked for the wisdom of the Perl Monks concerning the following question:

Humble greetings, Monks of the Perl.

After silently reading and learning for a couple of weeks I could use some help, where google and such couldn't help me. I hope that someone can give me a nudge in the right direction.


What I am trying to do:
---
My perl-script is supposed to upload some large files (around 100-500MiB) parallel to different services (the code is of course more compact for simplicity). This I have to do via HTTP-POST.
I noticed that the files uploaded via WWW:Mechanize or LWP::UserAgent are loaded into memory completely while they are being sent to the server.
So to circumvent this I tried setting "$HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1" and it worked.
But with what speed loss: Before the changed variable 5+mb/s, after ~100kb/s. Then I tried to find how/where I could set the used buffer-/chunksize ... nowhere :-|

So finally I abandoned all these nice modules and sat down with IO::Socket (sidenote: the speed is ideal).


My problem:
---
On every request I try I get a "400 Bad Request" error. I assume the calculated Content-Length is wrong, but I am clueless as to why.
I have even tried a "brute force attack" with varying Content-Lengths to find a correct length to go from there (^^)... But no luck :-|
How can I do it right? Or better: What am I doing wrong? ;>


My code:
(you might need to remove line 19 as I have to operate under windows)
#!/usr/bin/perl use warnings; use strict; use Socket; my $buffersize = 2 * 1024 * 1024; my $host = "127.0.0.1"; my $path = "/debug.php"; my $url = "http://".$host.$path; my $local_file = "smallishtestfile.tar.gz"; my $user = "cardman"; my $pass = "yvan eht nioj"; my $serverid = 666; my $upfile = shift || 'F:\\skript\\'.$local_file; system("cls"); print STDOUT "Starting upload to ".$url."\n\n"; $| = 1; my ($iaddr, $paddr, $proto); $iaddr = inet_aton($host); $paddr = sockaddr_in(80, $iaddr); $proto = getprotobyname('tcp'); unless(socket(SOCK, PF_INET, SOCK_STREAM, $proto)) { die "Couldn't ini +t socket: $!"; } unless(connect(SOCK, $paddr)) { die "Couldn't connect: $!\n"; } my $boundary = "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"; my @data = ( "--".$boundary."", "Content-Disposition: form-data; name=\"username\"", "", "".$user."", "--".$boundary."", "Content-Disposition: form-data; name=\"password\"", "", "".$pass."", "--".$boundary."", "Content-Disposition: form-data; name=\"serverid\"", "", "".$serverid."", "--".$boundary."", "Content-Disposition: form-data; name=\"file\"; filename=\"".$loca +l_file."\"", "Content-Type: application/octet-stream", "", ); open (FH,"< $upfile") or die "$!\n"; binmode FH; my $data = join("\r\n", @data); my $length = 0; $length += length($data); # length of the data to be POST'ed $length += -s FH; # filesize $length += length($boundary); # boundary is added once more at the +end of all the file-chunks my @head = ( "POST ".$path." HTTP/1.1", "Host: ".$host."", "Content-Length: $length", "Connection: keep-alive", "Content-Type: multipart/form-data; boundary=".$boundary."", "", ); my $header = join("\r\n", @head).$data; select SOCK; $| = 1; binmode SOCK; print SOCK $header; while(sysread(FH, my $buf, $buffersize)) { if(length($buf) < $buffersize) { $buf = $buf."\r\n--".$boundary."--"; syswrite SOCK, $buf, length($buf); } else { syswrite SOCK, $buf, $buffersize; } } close FH; my @response = (<SOCK>); shutdown SOCK, 1; print STDOUT "Result:\n-------\n @response"; close SOCK;


Error message:
HTTP/1.1 400 Bad Request
Date: Tue, 21 Jun 2011 21:01:32 GMT
Server: Apache
Content-Length: 408
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
Request header field is missing ':' separator.<br />
<pre>
--zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz</pre>
</p>
<hr>
<address>Apache Server at localhost Port 80</address>
</body></html>


Thanks in advance,
Kay

Replies are listed 'Best First'.
Re: HTTP-POST with IO::Socket -- Header problem
by Anonymous Monk on Jun 21, 2011 at 21:58 UTC
    The important part of the error message seems to be:
    Request header field is missing ':' separator
    I suggest using wireshark to ensure you're posting what you think you are.

    Since upload speed is your main concern, I also suggest checking out Furl and the various interfaces to Curl.

      Wireshark, or just print to STDERR what you are printing to SOCK. Upon first glance it appears you did not add a blank line (\r\n) in between the headers and the data portion, so the server sees the data and thinks it's a header.
SOLVED: HTTP-POST with IO::Socket -- Header problem
by Beavis (Initiate) on Jun 22, 2011 at 12:07 UTC
    I already did simulate the POST-request into a file for debugging and forgot to tell you (yesterday I was very exhausted after 20 hours of coding ;-).

    Instead of noticing this error with the missing "\r\n" I just noticed the different lengths:
    Between the stated "Content-Length" and the real length of the content there were always some bytes discrepancy - the bigger the file, the bigger the difference in length.

    So after I added the "\r\n" between header and body I used a static header with a manually calculated (correct) length. And it worked.

    Then I got the idea, that the linebreaks in the script itself and/or the linebreaks in the file could be the problem.

    So I converted all "\r\n" in the script into "\n" and printed everything to a file and the byte-discrepancies didn't change a bit.

    Next I noticed that I missed to use "binmode" for my output file that I used for debugging.
    Then I added "binmode" and SHAZZAM: The difference between the lengths still were some bytes, but for files with difference in size it always stayed a constant 4 bytes.


    I'd still like to know if someone can figure out why there is this 4-byte-discrepancy.
    But if noone knows/cares to share I can live with this dirty fix ;-)


    At last, my working code:
    #!/usr/bin/perl use warnings; use strict; use Socket; my $buffersize = 2 * 1024 * 1024; my $host = "127.0.0.1"; my $path = "/debug.php"; my $url = "http://".$host.$path; my $local_file = "testfile.rar"; my $local_path = "F:\\skript\\"; my $user = "cardman"; my $pass = "yvan eht nioj"; my $serverid = 666; my $upfile = shift || $local_path.$local_file; system("cls"); print STDOUT "Starting upload to ".$url."\r\n\r\n"; $| = 1; my ($iaddr, $paddr, $proto); $iaddr = inet_aton($host); $paddr = sockaddr_in(80, $iaddr); $proto = getprotobyname('tcp'); unless(socket(SOCK, PF_INET, SOCK_STREAM, $proto)) { die "Couldn't ini +t socket: $!"; } unless(connect(SOCK, $paddr)) { die "Couldn't connect: $!\r\n"; } my $boundary = "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"; my @data = ( "--".$boundary."", "Content-Disposition: form-data; name=\"username\"", "", "".$user."", "--".$boundary."", "Content-Disposition: form-data; name=\"password\"", "", "".$pass."", "--".$boundary."", "Content-Disposition: form-data; name=\"serverid\"", "", "".$serverid."", "--".$boundary."", "Content-Disposition: form-data; name=\"file\"; filename=\"".$loca +l_file."\"", "Content-Type: application/octet-stream", "", "", ); open (FILE,"< $upfile") or die "$!\n"; binmode FILE; my $data = join("\r\n", @data); my $length = 0; $length += length($data); # length of the data to be POST'ed $length += -s FILE; # filesize $length += length($boundary); # boundary is added once more at the +end of all the file-chunks $length += 4; # adding 4 bytes (no idea as to why, +but it works -- tested with 4 rng-files: 5 byte, 2mb, 15mb and 100mb) my @head = ( "POST ".$path." HTTP/1.1", "Host: ".$host."", "Content-Length: $length", "Connection: close", "Content-Type: multipart/form-data; boundary=".$boundary."", "", "", ); my $header = join("\r\n", @head).$data; # FOR DEBUGGING # open (FILE2,"< $upfile") or die "$!\n"; binmode FILE; # open(LOG, ">".$local_path."headers.txt"); binmode LOG; # print LOG $header; # while(sysread(FILE2, my $buf, 8)) { print LOG $buf; } # print LOG "\r\n--".$boundary."--"; # close LOG; close FILE2; select SOCK; $| = 1; binmode SOCK; print SOCK $header; while(sysread(FILE, my $buf, $buffersize)) { if(length($buf) < $buffersize) { $buf = $buf."\r\n--".$boundary."--"; syswrite SOCK, $buf, length($buf); } else { syswrite SOCK, $buf, $buffersize; } } close FILE; my @response = (<SOCK>); shutdown SOCK, 1; print STDOUT "Result:\n-------\n @response"; close SOCK;


    Thanks to you both for helping me so quickly :-)
    Kay
        No, because when I merge the header-array with the data via

        join("\r\n", @head).$data

        the $data variable has no "\r\n" at the beginning, so the header-array needs to have 2 line breaks so the paket as a whole looks like this:
        [...] Content-Type: multipart/form-data; boundary=".$boundary." --".$boundary." Content-Disposition: form-data; name=\"username\" [...]

        PS: If you had read my first post you would know that all these nice comfortable modules have the problem that they copy the file completely into RAM while uploading it.
        You can circumvent that by setting "$HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1", but then the speed drops drastically - from 5mb/s to about 100kb/s. And you can't set the buffersize anywhere to increase that speed.

        Best regards,
        Kay