Parsing of serialized data

No doubts, anyone who has worked with files or network sockets had to parse the input. There are thousands of modules created for this purpose and probably more shall be written. And this makes it fairly hard to find the module that will suit the framework of the application one is creating.

The passionate writing "Your main event may be another's side-show" from BrowserUk, I believe, in part (or maybe mostly) is based on that fact. So I've tried to check which functionality is offered by modules on the example of HTTP handling.

It appears, that most of the modules do both socket handling and parsing at the same time. As result, they can't be used to parse for example saved requests. Or they can't be used for handling SSL sockets, because they open connections themselves. And of course they force the user to certain type of reading - either blocking or async.

Does the module have to include the socket handling? Very unlikely. The socket handling in perl is already very simple. So instead, the process of reading and interpreting data (from socket or from anywhere else) can be presented as interaction between 3 parties: Reader, Parser and Manager. The Reader is responsible only for obtaining data. It shouldn't care what happens to the data. It only needs to know if the reading shall be continued. This defines the interface that the Parser shall provide to the Reader. This interface shall consist of single function "parse". This function shall take chunk of data and return to the Reader "stop" or "more" indicator. The Manager is the module or main code that gets the parsed data from Parser and based on that decides what to do next. It may tell the Parser to stop parsing and the Parser shall forward this request to the Reader.

With this type of role separation, the developer gets the flexibility of choosing modules for different roles. Different managers may handle parsed data differently, either all at once, or as it comes. Different readers would allow different reading styles. More than that, this approach should work for any parsing (well, at least I can't imagine a protocol that does not allow parsing stages). This means that some Managers could chain in more Parser modules of this style, for example to parse HTML of the body as it comes.

On CPAN I've found only the module HTTP::Parser that offers similar interface. It has single "parse" method returning to the caller status indicators (the number of those indicators is a bit too high, but it is not so important :) The one thing that this module didn't provide was the interaction with the Manager. It was simply saving all incoming data into HTTP::Request, which is fine in simple cases, but is not acceptable for more complex situations.

So, now I question myself, why the number of modules implementing the described approach is so small? Does it have some flaws, or the majority simply prefers "canned" solutions, and not the parts for building?

Comment on Parsing of serialized data

Replies are listed 'Best First'.
Re: Parsing of serialized data by locked_user sundialsvc4 (Abbot) on Oct 27, 2010 at 15:36 UTC
The approach is sound. It may be that it is perceived to be more difficult to generalize, into a serviceable CPAN contribution. Or, it simply may not have been done yet. The stuff that gets contributed is often more-or-less the “finished product of” someone who worked a long time on something and then decided to share it. That tool, having been built from whatever was readily available at the time, might not have been devised as a contribution; as “a better mousetrap.” It might have been offered from the perspective of, “you might spend a little less time whacking on code if you whack on this instead of whacking from scratch: Your Mileage May Vary.™” Which, of course, is often a Good Thing, itself. However, if you are now “signing up” . . . `:-D`
Re^2: Parsing of serialized data by andal (Hermit) on Oct 28, 2010 at 11:38 UTC
Well. The term "generalize" that you have used slightly scares me :) Do you mean that it will be difficult to describe the module so that it is understood how to use it? Generally speaking, it is not so hard to create such module for HTTP parsing. I have written one in C, so probably in Perl I can write one in 1-2 days. Maybe that is why such modules are not offered? Those who could have used them spend less time in writing them than in searching and understanding the documentation :)	[reply]
Re^3: Parsing of serialized data by locked_user sundialsvc4 (Abbot) on Oct 30, 2010 at 12:23 UTC
If you have put in the legwork on a good, generalized “HTTP parsing” solution ... that is a better mousetrap ... and that can be built to the standard-of-quality of CPAN such that you are actually willing and able to do it, then of course there will be great interest in what you have done. If you really can “Name That Tune™ in two days,” then by all means, go for it. What we all will expect is a thoroughly implemented and self-testing module, with at-least adequate documentation and generalized applicability, such that thousands of other people can “drop in” your solution to their applications and actually save time thereby. Naturally, it will behoove you to thoroughly understand what is already out there, and why your work is distinctive and different. It is quite frequent, and quite embarrassing, to discover that you have poured heart-and-soul into an effort that did not need to be done at all. Nevertheless: If you’ve got a better mousetrap, heh, we are certainly not running short of mice. There are many thousands of CPAN modules, and always room for one more. I say these things to encourage you to get busy. The Hall of CPAN Contributors (oddly, I myself am not one ...) is a hallowed hall, indeed.
Re^4: Parsing of serialized data by andal (Hermit) on Nov 02, 2010 at 09:42 UTC