Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^5: XML Parsing from URL

by afoken (Chancellor)
on Jul 05, 2015 at 12:28 UTC ( #1133249=note: print w/replies, xml ) Need Help??


in reply to Re^4: XML Parsing from URL
in thread SOLVED: XML Parsing from URL

OK, it's HTTP server push. I haven't seen that in the wild until now. That should be handled by LWP. Unfortunately, LWP is mostly designed to receive a single document from an HTTP request, not a (possible infinite) steam of documents wrapped in a multipart/mixed container.

The :content_cb callback in your original code is a way to hook into LWP::UserAgent, but it looks a bit scary. LWP documents that the callback is called with "a chunk of data". That may be everything from a single byte to a large block containing several documents and headers. Your code treats that "chunk of data" first as exactly one document with HTTP and multipart headers (in $data =~ /^--boundary/), then as a full XML document (in $twig->parse($data);).

This may "just work" because (a) the camera waits sufficiently long times between each part of the multipart container, (b) the XML document is very small, and (c) the :read_size_hint is sufficiently large to make LWP::UserAgent read the entire document, including headers, into one chunk.

A less scary version would collect chunks somewhere (e.g. in a private attribute of the $response object) until at least one complete document with HTTP and multipart headers was collected. Then, it should extract headers and the raw document from there (e.g. into a new HTTP::Response object), and only then call $twig->parse($document), perhaps as another callback.

Maybe that could be made into a separate module that extends LWP::UserAgent to handle multipart document streams.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1133249]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2023-05-30 23:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?