Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re^5: XML Parsing from URL

by afoken (Chancellor)
on Jul 05, 2015 at 12:28 UTC ( [id://1133249]=note: print w/replies, xml ) Need Help??

in reply to Re^4: XML Parsing from URL
in thread SOLVED: XML Parsing from URL

OK, it's HTTP server push. I haven't seen that in the wild until now. That should be handled by LWP. Unfortunately, LWP is mostly designed to receive a single document from an HTTP request, not a (possible infinite) steam of documents wrapped in a multipart/mixed container.

The :content_cb callback in your original code is a way to hook into LWP::UserAgent, but it looks a bit scary. LWP documents that the callback is called with "a chunk of data". That may be everything from a single byte to a large block containing several documents and headers. Your code treats that "chunk of data" first as exactly one document with HTTP and multipart headers (in $data =~ /^--boundary/), then as a full XML document (in $twig->parse($data);).

This may "just work" because (a) the camera waits sufficiently long times between each part of the multipart container, (b) the XML document is very small, and (c) the :read_size_hint is sufficiently large to make LWP::UserAgent read the entire document, including headers, into one chunk.

A less scary version would collect chunks somewhere (e.g. in a private attribute of the $response object) until at least one complete document with HTTP and multipart headers was collected. Then, it should extract headers and the raw document from there (e.g. into a new HTTP::Response object), and only then call $twig->parse($document), perhaps as another callback.

Maybe that could be made into a separate module that extends LWP::UserAgent to handle multipart document streams.


Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1133249]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-25 07:06 GMT
Find Nodes?
    Voting Booth?

    No recent polls found