in reply to Grammar based parsing methodology for multi MB strings/files (Marpa::R2/Regexp::Grammars)

Hmm, a grammar will usually build a data structure that takes much more memory that the original text file, but it sounds a bit excessive to me that a 2-MB file would imply a 33 GB memory foot print, that's really a huge difference.

My second point is that, from a quick look, your file looks pretty regular, I am not sure that a full-fledged grammar is really required (understand me, I am really in favor of using real parsers as soon as the input gets a bit complicated in terms of positions of tokens and the like, but here, I am not totally convinced that it is not overkill or over-engineering). From a quick glance at your data, I would probably have a couple of regexes to get rid of comments and then parse it manually using the semi-colon as a separator.

But that's only a personal opinion, I did not take the time to analyze your grammars in any detail and I have no idea of what you are really trying to do with this data, so I may be completely wrong. This was just premature guts feeling after a very brief look at your problem, it could be that just working an hour or two on it would lead me to a very different conclusion.

Please also note that I know a very little bit about Regexp::Grammars, but next to nothing about Marpa.

  • Comment on Re: Grammar based parsing methodology for multi MB strings/files (Marpa::R2/Regexp::Grammars)

Replies are listed 'Best First'.
Re^2: Grammar based parsing methodology for multi MB strings/files (Marpa::R2/Regexp::Grammars)
by tj_thompson (Monk) on May 06, 2014 at 23:06 UTC

    I understand your point on questioning whether a grammar is necessary. I have up till now used a line by line method. However, I've dealt with enough bugs now that I'd rather just invest the time in a grammar that will do the job and I believe will be easier to maintain in the future.

    I also foresee this syntax changing due to production changes in the not too distant future. I think a grammar will be easier to update than my current parser would have been.

    Thanks for looking it over for me in any case :)