BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Does anyone know of a publicly available example of one of the many CPAN parsing modules being used to parse a full-featured, block-structured language?

I'm looking for real world examples rather than toy, how-to demos.

Thanks.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

  • Comment on Block-structured language parsing using a Perl module?

Replies are listed 'Best First'.
Re: Block-structured language parsing using a Perl module?
by Illuminatus (Curate) on Aug 14, 2012 at 18:08 UTC
    I can't really help you directly, but I would contact the author of Parse::Eyapp, if you haven't already. I know it's pretty new, but he must have had a reason to expand Yapp, and his changes look pretty extensive.

    fnord

Re: Block-structured language parsing using a Perl module?
by Anonymous Monk on Aug 15, 2012 at 03:05 UTC

      The problem with those is they are

      • either: hand-crafted parsers constructed to parse a specific language (HTML).

        These are no use because I'm looking for a parser constructor module.

      • or: examples, of using the parser constructor module to construct a parser to parse some more or less complicated language, written by the author of the module that does the construction.

        It is unsurprising that the author of a given module is motivated enough, and reasonably adept at using his own module, to persist in getting something moderately complicated to work.

        But can anyone else?

      If I could find an example of a parser module being used a) in a real-world project; b) of reasonable complexity; c) by some one other than its author; it would give some level of confidence that the module stands up to a) being learned; b) being debugged; c) being maintained in a timely fashion when bugs discovered through real-world usage are reported.

      Of the 3 modules I've experimented with, they:

      • had awful apis -- large, complicated, verbose -- with lousy documentation, often as not couched in so much academic/theoretical terminology as to be almost unintelligible.

        I want to use a parser; not learn about the theory behind them.

      • gave almost useless error diagnostics when defining the grammar; and even worse diagnostics when given non-complaint source to parse.
      • so ridiculously slow in operation that the are almost useless for real-world usage.
      • produce parse tree so complicated you need to write another parser to process them.

      A can see I am going to end up writing my own; but given the richness of the modules on cpan, I hoped that there was one amongst them that might stand up to RW usage.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        :) I know this probably doesn't qualify also (and you probably saw it) , but GraphViz2::Marpa is not by Marpa author :) though it is also accompanied by how-to article

        FWIW, Marpa guy does give some praise for his error diagnostics on his blog :)

        I want to use a parser; not learn about the theory behind them.

        Part of the problem of lousy documentation is that you have to know enough theory to know both what type of parser you can use on a grammar and if your grammar is even parsable. Because semi-structured text can vary so much in the structure and meaning, the best any general purpose grammar engine can do is push back on you a little bit to figure out whether your language is a regular language, whether you need lookahead and how much, and how you handle things like recursion, if at all.

        Also a lot of the theoretical work comes from the world of linguistics, which is messy on its own.

        I agree about lousy APIs though.

        I can't speak about the performance of Regexp::Grammars, but if I were doing something like this, I'd start there for ease of use. I'd use Marpa for speed and completeness.

Re: Block-structured language parsing using a Perl module?
by tobyink (Canon) on Aug 17, 2012 at 07:54 UTC

      Thank you tobyink. Those are both fine examples of the information I was looking for.

      (It's a shame that the metacpan site sends my browser (Opera) off into la-la land, but that's not your problem :)

      For me, the most telling files are the "compiled" grammars: OwlFn & CSS.

      I realise these are generated files, but damn are they ever resource hungry. It is no wonder that P::RD is so slow. Dog-forbid that either of you authors ever has to go plugging around inside there in order to solve a problem.

      Have you ever had occasion to measure the performance of your parser? (Do OwlFn source files ever get big enough that it is a concern?)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Re: Block-structured language parsing using a Perl module?
by thargas (Deacon) on Aug 16, 2012 at 17:43 UTC

      Thanks for the link. I've pulled the pdf and will give it a read over the next few days.

      Though I do so with a great deal of skeptisism. P::RD (used to?) commits every one of my cardinal sins:

      1. horrible API;
      2. lousy documentation;
      3. useless diagnostics;
      4. glacial performance;

      Maybe Regexp::Grammars does better, but on a cursory inspection, I do not hold out much hope :(


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        I don't know a lot about it. I did have to deal with it once in a program which took commands in an sql-like syntax and we had it pre-compile the the grammar and save it instead of compiling it on each load, which did make a difference IIRC. It was a while ago.
A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.