fert has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I have no experience parsing MIME encoded messages, so forgive my noobness here. I'm looking to find out if this is expected behavior given the input, hopefully somebody has the experience to help out.

First up, this is the input data, I've taken out most of the context sensitive stuff, but basically this is a response I get from an API:

--MIMEBoundaryurn_uuid_5922B37D7F43649CB512578951566415053 Content-Type: application/xop+xml; charset=utf-8; type="text/xml" Content-Transfer-Encoding: binary Content-ID: <0.urn:uuid:5922B37D7F43649CB512578951566415054> <XML RESPONSE GOES HERE> --MIMEBoundaryurn_uuid_5922B37D7F43649CB512578951566415053 Content-Type: application/zip Content-Transfer-Encoding: binary Content-ID: <urn:uuid:A86D19E1CDE87B2FD01257895160515> ^_~K^H^H­>üJ^@^Ctest^@^C^@^@^@^@^@^@^@^@^@ --MIMEBoundaryurn_uuid_5922B37D7F43649CB512578951566415053--

So basically this message is supposed to contain a XML response section, followed by a Binary file, in this case it is a gziped file. I replaced the gzip file with an empty gzip file for space, and removed the acutal XML response, but I think the problem might just be how the MIME is structured?

No knowing much else, this is the code I have tried to use to parse out the two entities, but all I get is a single text/plain entity, which contains as it's body text, everything from the XML code start down (including all the MIME tags below the XML code). Any ideas on how to slice this?

Code so far

my $parser = new MIME::Parser (); my $ent = $parser->parse_open ($response_file); $ent->dump_skeleton; print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"; print $ent->stringify_body;

And the output:

Content-type: text/plain Effective-type: text/plain Body-file: ./msg-11891-1.txt -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <XML RESPONSE GOES HERE> --MIMEBoundaryurn_uuid_5922B37D7F43649CB512578951566415053 Content-Type: application/zip Content-Transfer-Encoding: binary Content-ID: <urn:uuid:A86D19E1CDE87B2FD01257895160515> ^_~K^H^H­>üJ^@^Ctest^@^C^@^@^@^@^@^@^@^@^@ --MIMEBoundaryurn_uuid_5922B37D7F43649CB512578951566415053--
So if I understand this right, it did not read in two entities but just one, again I wonder if the MIME is correct.

Replies are listed 'Best First'.
Re: Help on Parsing MIME
by almut (Canon) on Nov 12, 2009 at 17:33 UTC

    Do you have a line specifying the boundary, like this

    Content-Type: multipart/mixed; boundary="MIMEBoundaryurn_uuid_5922B37D +7F43649CB512578951566415053"

    at the beginning of the input data?  (there should be...   see also rfc2046, section 5.1)

      no, this is the exact text I get back from the call, I thought this was weird as well

        As a workaround/heuristic, maybe you could reconstruct it yourself from the lines such as --MIMEBoundaryurn_uuid_5922B37D7F43649CB512578951566415053 (i.e. strip off the "--"), and simply prepend the required multipart header to the response before passing it to the parser...

        Better yet, of course, fix the response in the first place (if that's an option).