comment on

Hmm. While I am not suprised about the P::RD solution that I wrote being the slowest, I would like to point a couple of things out. Most of these solutions are buggy or not appropriate for the data set you are using in some way or another (I can feel the venom in BrowserUks eyes already ;-)

And by buggy I mean the following: I dont believe that you can consider a parser correct if it accepts something other than the specification it was intended to parse. Ie, I consider a parser to work correctly if and only if it parses a language specification correctly and nothing more.

Anyway heres my analysis of the different solutions. Additional stuff is welcome.

First off IO's solution only returns the innermost array. Furthermore it throws the exception
```
Unmatched ( before HERE mark in regex m/\(\{( << HERE \(\{()\}\)/ at C
+:\Temp\castaway.pl line 282.
[download]
```
when you feed it erroneous data like: '({({})'
Your own solution castaway returns empty values in the last slot in the array. (remember I pointed this out earlier). Also it returns ['']; when you feed it '({({})'. Similarly it doesnt handle barewords within arrays like '({foo})' properly. Also it returns '})' for '({})})'. Oh and it also seems to return the wrong value for ucache. Not to mention weirdness for ({(["foo":"bar"])})
merlyns solution doesnt handle hash values (it seems to work ok without them however). The only other case (that i checked) that his code choked on was '({})})'. (It parsed that as [])
BrowserUks seems to work pretty well. It will return an error message for some strings that arent correct, but it still doesnt correctly handle '({})})' or '({foo})' returning [')'] and ['o'] instead.
From the test cases that used it seems that my version correctly parses or rejects all the input I threw at it. Furthermore by changing the script slightly to
```
value : string
      | number
      | array
      | hash
[download]
```
it almost doubles its speed. No great feat I realize considering that it now does about 25 parses a second. :-)

So first off I would say that when you test code you should test both positive and negative cases. Second off I can't feel bad about having the slowest solution as it quite frankly is the only one that correctly parses or rejects the data it recieves.

To quote from Code Complete (28.2 Code Tuning)

A fast program is just as important as a correct one--False! It's hardly ever true that programs need to be fast or small before they need to be correct. Gerald Weinberg tells the story of a programmer who was flown to Detroit to help debug a program that was in trouble. The programmer worked with the team of programmers who had developed the program, and after several days he concluded that the situation was hopeless.

On the flight home, he mulled over the last few days and realized the true problem. By the end of the flight, he had outlined the necessary code. He tested it for several days and was about to return to Detroit when he got a telegram saying the project had been cancelled because the program was impossible to write. He headed back to Detroit anyway and convinced the executives that the project could be completed.

He then had to convince the project's original programmers. They listened to his presentation, and when he'd finished, the creator of the old system asked,

"And how long does your program take?"

"That varies, but about ten seconds per input."

"Aha! But my program takes only one second per input." The veteran leaned back, satisfied that he'd stumped the upstart programmer. The other programmers seemed to agree, but the new programmer wasn't intimidated.

"Yes, but your program doesn't work. If mine doesn't have to work, I can make it run instantly and take up no memory. "

(emphasis added by me.)

However I have no doubt that with a bit of hacking all of the people involved here can fix their stuff and still have it faster than the P::RD approach, but this is a good example of why proper testing of both positive and negative cases is essential. Its also a good example of why P::RD is a cool tool. You are much more likely to produce a correct (but slow) result with it than without it, and to do so in much less time than with another approach.

A last point about all of this. You are reciving packets over a network connection. So I'm guessing that just the network part takes much longer than even the slowest solution. Thus the network time is going to swamp the parse time by a great deal, and to me would suggest theres no point in optimizing this.

I think having a read of the code complete article is a worthy use of time.

Cheers,

--- demerphq
my friends call me, usually because I'm late....

In reply to Re: Re: Parsing Text into Arrays.. by demerphq
in thread Parsing Text into Arrays.. by castaway

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.