Conquistadog has asked for the wisdom of the Perl Monks concerning the following question:

Update:

Anyone using perl from nginx, keep HTTP::Body handy! (and client_body_in_file_only on; in your nginx config)

my $file = $r->request_body_file(); my $body = HTTP::Body->new($r->header_in('Content-Type') , $r->header_in('Content-Length')); my $fh = IO::File->new("<$file") or die "can't open $file: $!"; my $len = $r->header_in('Content-Length'); while ($len && 0 < $len) { $rh->read(my $buf, ($len < 8192 ? $len : 8192)); $len -= length($buf); $body->add($buf); }

After that point, $body has all the stuff and info needed.

Thanks for the help, everyone!


Greetings fellow faithfuls!

Today my quest is for an approach to extract body "parts" from an HTTP request body of type multipart/form-data -- but to do it all on-disk and in-place without loading any entire "part" into memory at any time.

Rationale: I am using nginx, which politely stores the request body in a file for me before invoking my perl handler module. In the present case, incoming HTTP requests sometimes contain the contents of files being uploaded, as part(s) of a multipart/form-data message body. Consequently, the request and its parts are often very large. My last approach using HTTP::Request (i.e. HTTP::Message and friends) fails in the large-body-part case, presumably because that approach does require (or result in) the entire request body's presence in a scalar (and indeed more than once, in cases).

Regardless of the approach, the need is ultimately this:

Given the name of a file containing the HTTP request body, I need to produce (or have produced for me) an appropriate number of files corresponding to and containing the extracted multipart/form-data parts.

Has anyone done something like this, and/or know of a workable approach, library, or tool to use?

Many thanks,
Conquistadog

Replies are listed 'Best First'.
Re: On-disk multipart/form-data part extraction
by blue_cowdawg (Monsignor) on Dec 24, 2012 at 16:47 UTC
        Today my quest is for an approach to extract body "parts" from an HTTP request body of type multipart/form-data -- but to do it all on-disk and in-place without loading any entire "part" into memory at any time.

    eh? Not too sure what you mean by that. As soon as you read from any file a piece of it is in memory...

    What have you tried? Is this part of an email? Is it CGI input? Is it alpaca wool?

    Have you looked at CGI which is pretty much standard stuff?


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg

      I suppose I should have been clearer!

      Of course it's fine to have pieces of things in memory during processing. I expected that to be taken as given.

      We just can't hold the entirety of any given "part" in memory, since the parts can be (and regularly are) too large for that.

      Clearer? Thanks!

      Also, since you helpfully mentioned it, regarding CGI:: there is seemingly no success for me, either.

      You see, CGI:: can parse multipart bodies with its upload() member, but seemingly only when they come from STDIN or via CGI::Fast -- neither of which are available to me (nor desirable to me, indeed) as a nginx-embedded perl module.

      Hence my search for another way. The perl "staples" do not seem to cover my use case.

            STDIN or via CGI::Fast -- neither of which are available to me

        STDIN not available? EH?? I've never seen an environment where stdin wasn't available. Even if you opened STDIN directly

        close STDIN; open STDIN,"< myfile.text" or die "myfile.txt: $!"; ... do stuff
        as such. Is this a CGI script or what are you up to here?


        Peter L. Berghold -- Unix Professional
        Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg

        nginx-embedded perl module

        never heard of it :)

        So http://wiki.nginx.org/HttpPerlModule? In that case,

        ;P

        local *STDIN; open STDIN, '<', $r->request_body_file or die $!; binmode STDIN; my $q = CGI->new( \*STDIN );

        :)

        my $body = Nginx_Body ( $r ); my $uploads = $body->upload; # hashref ## cleanup temp files undef $uploads; undef $body; sub Nginx_Body { my $r = shift; my $tmpdir = shift; my $content_type = $r->headers_in('Content-Type'); my $content_length = $r->headers_in('Content-Length'); my $body = HTTP::Body->new( $content_type, $content_length ); $body->tmpdir( $tmpdir ) if $tmpdir; my $length = $content_length; open my($bodyfh), '<:raw', $r->request_body_file or die $!; while ( $length ) { my $read = read( $bodyfh, my $buffer, ( $length < 8192 ) ? $length : 8192 ); my $bufferlength = length($buffer); die "IMPOSSIBLE $read != $bufferlength " if $read != $bufferle +ngth ; $length -= $bufferlength; $body->add( $buffer ); } return $body; }

        Although, after reading a bit from Nginx - full-featured perl support for nginx nginx might have this feature already , or at least it should :)

Re: On-disk multipart/form-data part extraction
by Anonymous Monk on Dec 24, 2012 at 17:02 UTC
      Try HTTP::Body::MultiPart , HTTP::Body::MultiPart::Extend, MIME::Tools/mimeexplode/MIME::Explode

      Great suggestions, thanks! MIME:Tools::mimeexplode in particular looks very promising. Will give it a go and report back.

Re: On-disk multipart/form-data part extraction (Nginx Plack)
by Anonymous Monk on Dec 24, 2012 at 18:02 UTC
    FWIW, Plack will run under Nginx, and like CGI.pm will keep uploads/multipart in tempfiles for the duration of the request

      Goodness... Plack seems to be a whole "framework" that I'd have to migrate my handler to. Am I wrong in that? I would rather invoke the operations in my current handler than convert to another "framework" (a word I use with some disdain).

        If you migrate, it will run on everything, not just nginx, but yeah, it is PSGI

        I did give HttpUploadModule a try, but I had difficulty getting it to send to my handler an upload-free but otherwise complete body (i.e. with the other form fields that were present in the posted body). I may revisit that, but it seems to me that using perl to explode them out of the original raw body file would be more workable.