On-disk multipart/form-data part extraction

Conquistadog has asked for the wisdom of the Perl Monks concerning the following question:

Update:

Anyone using perl from nginx, keep HTTP::Body handy! (and client_body_in_file_only on; in your nginx config)

my $file = $r->request_body_file();
my $body = HTTP::Body->new($r->header_in('Content-Type')
                         , $r->header_in('Content-Length'));
my $fh = IO::File->new("<$file") or die "can't open $file: $!";
my $len = $r->header_in('Content-Length');
while ($len && 0 < $len) {
  $rh->read(my $buf, ($len < 8192 ? $len : 8192));
  $len -= length($buf);
  $body->add($buf);
}
[download]

After that point, $body has all the stuff and info needed.

Thanks for the help, everyone!

Greetings fellow faithfuls!

Today my quest is for an approach to extract body "parts" from an HTTP request body of type multipart/form-data -- but to do it all on-disk and in-place without loading any entire "part" into memory at any time.

Rationale: I am using nginx, which politely stores the request body in a file for me before invoking my perl handler module. In the present case, incoming HTTP requests sometimes contain the contents of files being uploaded, as part(s) of a multipart/form-data message body. Consequently, the request and its parts are often very large. My last approach using HTTP::Request (i.e. HTTP::Message and friends) fails in the large-body-part case, presumably because that approach does require (or result in) the entire request body's presence in a scalar (and indeed more than once, in cases).

Regardless of the approach, the need is ultimately this:

Given the name of a file containing the HTTP request body, I need to produce (or have produced for me) an appropriate number of files corresponding to and containing the extracted multipart/form-data parts.

Has anyone done something like this, and/or know of a workable approach, library, or tool to use?

Many thanks,
Conquistadog

Comment on On-disk multipart/form-data part extraction Select or Download Code

Replies are listed 'Best First'.
Re: On-disk multipart/form-data part extraction by blue_cowdawg (Monsignor) on Dec 24, 2012 at 16:47 UTC
Today my quest is for an approach to extract body "parts" from an HTTP request body of type multipart/form-data -- but to do it all on-disk and in-place without loading any entire "part" into memory at any time. eh? Not too sure what you mean by that. As soon as you read from any file a piece of it is in memory... What have you tried? Is this part of an email? Is it CGI input? Is it alpaca wool? Have you looked at CGI which is pretty much standard stuff? Peter L. Berghold -- Unix Professional Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg	[reply]
Re^2: On-disk multipart/form-data part extraction by Conquistadog (Novice) on Dec 24, 2012 at 17:06 UTC
I suppose I should have been clearer! Of course it's fine to have pieces of things in memory during processing. I expected that to be taken as given. We just can't hold the entirety of any given "part" in memory, since the parts can be (and regularly are) too large for that. Clearer? Thanks!	[reply]
Re^2: On-disk multipart/form-data part extraction by Conquistadog (Novice) on Dec 24, 2012 at 17:13 UTC
Also, since you helpfully mentioned it, regarding CGI:: there is seemingly no success for me, either. You see, CGI:: can parse multipart bodies with its upload() member, but seemingly only when they come from STDIN or via CGI::Fast -- neither of which are available to me (nor desirable to me, indeed) as a nginx-embedded perl module. Hence my search for another way. The perl "staples" do not seem to cover my use case.	[reply]
Re^3: On-disk multipart/form-data part extraction by blue_cowdawg (Monsignor) on Dec 25, 2012 at 03:06 UTC
STDIN or via CGI::Fast -- neither of which are available to me STDIN not available? EH?? I've never seen an environment where stdin wasn't available. Even if you opened STDIN directly `close STDIN; open STDIN,"< myfile.text" or die "myfile.txt: $!"; ... do stuff` [download] as such. Is this a CGI script or what are you up to here? Peter L. Berghold -- Unix Professional Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg	[reply] [d/l]
Re^3: On-disk multipart/form-data part extraction by Anonymous Monk on Dec 24, 2012 at 17:42 UTC
nginx-embedded perl module never heard of it :) So http://wiki.nginx.org/HttpPerlModule? In that case, ;P ~~`local STDIN; open STDIN, '<', $r->request_body_file or die $!; binmode STDIN; my $q = CGI->new( \STDIN );`~~ ~~[download]~~ :) my $body = Nginx_Body ( $r ); my $uploads = $body->upload; # hashref ## cleanup temp files undef $uploads; undef $body; sub Nginx_Body { my $r = shift; my $tmpdir = shift; my $content_type = $r->headers_in('Content-Type'); my $content_length = $r->headers_in('Content-Length'); my $body = HTTP::Body->new( $content_type, $content_length ); $body->tmpdir( $tmpdir ) if $tmpdir; my $length = $content_length; open my($bodyfh), '<:raw', $r->request_body_file or die $!; while ( $length ) { my $read = read( $bodyfh, my $buffer, ( $length < 8192 ) ? $length : 8192 ); my $bufferlength = length($buffer); die "IMPOSSIBLE $read != $bufferlength " if $read != $bufferle +ngth ; $length -= $bufferlength; $body->add( $buffer ); } return $body; } [download] Although, after reading a bit from Nginx - full-featured perl support for nginx nginx might have this feature already , or at least it should :)	[reply] [d/l] [select]
Re^4: On-disk multipart/form-data part extraction by Conquistadog (Novice) on Dec 24, 2012 at 18:44 UTC
Re: On-disk multipart/form-data part extraction by Anonymous Monk on Dec 24, 2012 at 17:02 UTC
Try HTTP::Body::MultiPart , HTTP::Body::MultiPart::Extend, MIME::Tools/mimeexplode/MIME::Explode	[reply]
Re^2: On-disk multipart/form-data part extraction by Conquistadog (Novice) on Dec 24, 2012 at 17:20 UTC
Try HTTP::Body::MultiPart , HTTP::Body::MultiPart::Extend, MIME::Tools/mimeexplode/MIME::Explode Great suggestions, thanks! `MIME:Tools::mimeexplode` in particular looks very promising. Will give it a go and report back.	[reply] [d/l]
Re: On-disk multipart/form-data part extraction (Nginx Plack) by Anonymous Monk on Dec 24, 2012 at 18:02 UTC
FWIW, Plack will run under Nginx, and like CGI.pm will keep uploads/multipart in tempfiles for the duration of the request	[reply]
Re^2: On-disk multipart/form-data part extraction (Nginx Plack) by Conquistadog (Novice) on Dec 24, 2012 at 18:53 UTC
Goodness... Plack seems to be a whole "framework" that I'd have to migrate my handler to. Am I wrong in that? I would rather invoke the operations in my current handler than convert to another "framework" (a word I use with some disdain).	[reply]
Re^3: On-disk multipart/form-data part extraction (Nginx Plack) by Anonymous Monk on Dec 25, 2012 at 05:29 UTC
If you migrate, it will run on everything, not just nginx, but yeah, it is PSGI	[reply]
Re^2: On-disk multipart/form-data part extraction (Nginx Plack) by Anonymous Monk on Dec 24, 2012 at 18:13 UTC
Its written in C :) http://wiki.nginx.org/HttpUploadModule	[reply]
Re^3: On-disk multipart/form-data part extraction (Nginx Plack) by Conquistadog (Novice) on Dec 24, 2012 at 18:50 UTC
I did give `HttpUploadModule` a try, but I had difficulty getting it to send to my handler an upload-free but otherwise complete body (i.e. with the other form fields that were present in the posted body). I may revisit that, but it seems to me that using perl to explode them out of the original raw body file would be more workable.	[reply] [d/l]