in reply to Re^2: XML Parsing from URL
in thread SOLVED: XML Parsing from URL
This "boundary" stuff and the two "Content"-lines look like HTTP multipart POST data (RFC2388) to me. On the other hand, HTTP POST data should also have a "Content-Disposition" header with a name attribute after each boundary.
Is this real data or shortened? Where does the data come from?
In a HTTP context, I would expect some library to parse the HTTP data and provide them in a more accessible form. For example, using the classic CGI module, each XML document would be available by its parameter name using the param() or upload() methods.
Alexander
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: XML Parsing from URL
by jshank (Initiate) on Jul 04, 2015 at 23:38 UTC | |
Thanks Alexander. This is real data coming from a Hikvision IP camera. There is a goo.gl shortlink in the original code if you want to look at the documentation for the interface. LWP was able to properly handle the multipart and then I parse out the XML portion. | [reply] |
by afoken (Chancellor) on Jul 05, 2015 at 20:35 UTC | |
I had some free time to play with LWP and server push. In fact, it is quite easy to make LWP call a callback whenever a complete document from a multipart container is received. It does not matter if the part is received in one chunk or in many chunks, and it does not matter if one chunk of data contains one or more parts. The trick is to know that LWP uses HTTP::Response, which inherits from HTTP::Message. And HTTP::Message contains everything needed to handle multipart messages, both for server push and for multipart POST requests. This is my code. Feel free to use it. Make it a CPAN module if you like it. So, to properly handle documents, use MultipartFilter;, remove the :content_cb handler, delete sub raw_handler, and insert the following code before my $response = $browser->get(...);:
Test enviroment: Apache 2.4.12 and Perl 5.18.1 on Slackware64 14.1 Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-) | [reply] [d/l] [select] |
by afoken (Chancellor) on Jul 05, 2015 at 12:28 UTC | |
OK, it's HTTP server push. I haven't seen that in the wild until now. That should be handled by LWP. Unfortunately, LWP is mostly designed to receive a single document from an HTTP request, not a (possible infinite) steam of documents wrapped in a multipart/mixed container. The :content_cb callback in your original code is a way to hook into LWP::UserAgent, but it looks a bit scary. LWP documents that the callback is called with "a chunk of data". That may be everything from a single byte to a large block containing several documents and headers. Your code treats that "chunk of data" first as exactly one document with HTTP and multipart headers (in $data =~ /^--boundary/), then as a full XML document (in $twig->parse($data);). This may "just work" because (a) the camera waits sufficiently long times between each part of the multipart container, (b) the XML document is very small, and (c) the :read_size_hint is sufficiently large to make LWP::UserAgent read the entire document, including headers, into one chunk. A less scary version would collect chunks somewhere (e.g. in a private attribute of the $response object) until at least one complete document with HTTP and multipart headers was collected. Then, it should extract headers and the raw document from there (e.g. into a new HTTP::Response object), and only then call $twig->parse($document), perhaps as another callback. Maybe that could be made into a separate module that extends LWP::UserAgent to handle multipart document streams. Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-) | [reply] [d/l] [select] |