No doubts, anyone who has worked with files or network sockets had to parse the input. There are thousands of modules created for this purpose and probably more shall be written. And this makes it fairly hard to find the module that will suit the framework of the application one is creating.
The passionate writing "Your main event may be another's side-show" from BrowserUk, I believe, in part (or maybe mostly) is based on that fact. So I've tried to check which functionality is offered by modules on the example of HTTP handling.
It appears, that most of the modules do both socket handling and parsing at the same time. As result, they can't be used to parse for example saved requests. Or they can't be used for handling SSL sockets, because they open connections themselves. And of course they force the user to certain type of reading - either blocking or async.
Does the module have to include the socket handling? Very unlikely. The socket handling in perl is already very simple. So instead, the process of reading and interpreting data (from socket or from anywhere else) can be presented as interaction between 3 parties: Reader, Parser and Manager. The Reader is responsible only for obtaining data. It shouldn't care what happens to the data. It only needs to know if the reading shall be continued. This defines the interface that the Parser shall provide to the Reader. This interface shall consist of single function "parse". This function shall take chunk of data and return to the Reader "stop" or "more" indicator. The Manager is the module or main code that gets the parsed data from Parser and based on that decides what to do next. It may tell the Parser to stop parsing and the Parser shall forward this request to the Reader.
With this type of role separation, the developer gets the flexibility of choosing modules for different roles. Different managers may handle parsed data differently, either all at once, or as it comes. Different readers would allow different reading styles. More than that, this approach should work for any parsing (well, at least I can't imagine a protocol that does not allow parsing stages). This means that some Managers could chain in more Parser modules of this style, for example to parse HTML of the body as it comes.
On CPAN I've found only the module HTTP::Parser that offers similar interface. It has single "parse" method returning to the caller status indicators (the number of those indicators is a bit too high, but it is not so important :) The one thing that this module didn't provide was the interaction with the Manager. It was simply saving all incoming data into HTTP::Request, which is fine in simple cases, but is not acceptable for more complex situations.
So, now I question myself, why the number of modules implementing the described approach is so small? Does it have some flaws, or the majority simply prefers "canned" solutions, and not the parts for building?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing of serialized data
by locked_user sundialsvc4 (Abbot) on Oct 27, 2010 at 15:36 UTC | |
by andal (Hermit) on Oct 28, 2010 at 11:38 UTC | |
by locked_user sundialsvc4 (Abbot) on Oct 30, 2010 at 12:23 UTC | |
by andal (Hermit) on Nov 02, 2010 at 09:42 UTC |