In my previous Meditation I point out that the grammar feature can be used for binary data, and that subsumes unpack.

Well, what about pack? More generally, if I already have a grammar that states the way something should be arranged, why can't I use that to generate output?

A trivial example is a grammar that has no backtracking possible. <int><char><float> would be similar in meaning to "icf" as a pack control string. More usfully, given a definition of what goes where and how, and a list of named things, apply one to the other and produce output. A high-level nonterminal grammar production makes a very good definition of "what goes where and how", each item in the production being a name for the data and an output rule for how to do the formatting.

I think a Perl6 pattern can be at the very least bent to this end, by generating as a side-effect and not really consuming input. But maybe some feature designs could be involved to make this a cleaner concept, not a trick.

How introspective is the Perl6 pattern definition? Instead of matching against one, can I examine it to see what it's made of? It would be elegant if the pattern object itself had a parse tree, in the same form as the ones it builds.

—John

Edit kudra, 2002-09-05 Replaced I tags in title with '

Replies are listed 'Best First'.
•Re: Using Perl6 patterns/grammar definition for <I>output</I>?
by merlyn (Sage) on Sep 04, 2002 at 23:10 UTC
    After theDamian took a look at my spew program, he mentioned that it'd be nice if Parse::RecDescent could automatically randomly walk a grammar instead of just parse one, and seemed to think that there was enough meta inforamation in the internal grammar tree to do that.

    If that's the way he felt about P::RD, I'm sure perl6 re-grammars are powerful enough as well, since a lot of the perl6 re-grammar stuff is based on P::RD ideas.

    -- Randal L. Schwartz, Perl hacker

Re: Using Perl6 patterns/grammar definition for <I>output</I>?
by dpuu (Chaplain) on Sep 05, 2002 at 00:07 UTC
    I think/hope that Perl6 will have adequate introspection. But you'd have to ensure that a match-object contains all the information needed to fully constrain the generated output. Consider
    rule foo:w { $a := (\d+) = $b := (\w+) [ , | and ] $c := (\w+) }; dump( rule=>"foo"; a=>7, b=>"hello", c=>"world" );
    After substituting the values into the captures we then are left with:
    \s+ 7 \s+ hello [ \s* , \s* | \s+ and \s+ ] world
    So the dumper now needs to make two sets of decisions: what whitespace to substitute for the \ss; and whether to use a comma or the word "and" as the final separator. The whitespace issue could be defined as a standard pretty-print policy. But in the absence of any further hints, the dump function can probably do no better than choose an arbitrary (e.g. always first) member of an uncaptured alternation

    So yes, I definitely think it could be done; but we may want a mechanism to set the policy for the unconstrained parts of a rule. A simple set of rules, such as "generate as little text as possible"; and "choose first option on alternations" would probably be enough for simple expressions. But what about look-ahead assertions; or embedded code: these things would be much harder to define simple rules for. Perhaps our dump function would require anything non-trivial to be captured.

    An XML grammar would probably be a nice thing to use to play with the idea: more complex than your pack/unpack (BTW: I like that idea); but much simpler than, say, a full Perl6 grammar. --Dave

      I was thinking that a derived grammar would be unambiguous about the output format, but you would reuse many useful grammar rules.

      Instead of $a := (\d+), write  $a := <number_token> where number_token is itself defined as you show, or more complex such as (0|0[xX])\d+. Then, a derived grammar would override number_token to be something that formats, as opposed to something that accepts flexible input. Some standard mechanism would say "here is the string to be emitted" within the {} block of the rule.

      Terminals, by definition, would need to contain code to format the output. Nonterminals just state the arrangement of terminals, and in the absence of alternation and repitition (push that down to a lower level rule) would be used as-is.

      —John

        Using named rules to key formatting sounds like a good idea; but I'm not sure that I like the idea of using a "derived grammar" to represent the formatting of these named rules: it seems beeter to introduce an explicit formatting object. This formatting object would provide information to a visitor that traverses a parse tree (i.e. a match object plus its grammar); but it would not have the hierarchical structure of the grammar.

        Using a separate formating object has several advantages. Firstly, it avoids the parallel hierachies maintenance problem (you want formatting to be robust against modifications to the grammar). Even if you are only overriding the terminal rules of the grammar, you still have the issue that a change to the base grammar would be obscured in the derived grammar.

        A more important issue is that it will be common to import terminal symbols from other grammars (e.g. CORE::*). How many grammars to you want to override, just to format some output?

        I see a formatting object as a simple map of rule_name => format spec; with some mechanism for composition. This doesn't need to follow the hierarchy of a grammer:

        my $format = new Format ( ws => { col >= 70 ? "\n" : " " }, currency => { sprintf("%.2f", $currency) }, date => {sprintf("%04d-%02d-%02d", @$date{qw/year month day/})}, ); $text =~ /<grammar.rule>/; $format.output($0; grammar=>"grammar.rule", fh=>$*STDOUT)
        The output method would traverse the grammar; each time it meets a rule that has a format, it applies that format to the corresponding entry in the match object ($0). This does not require a derived grammar: just an extra specification object. Sometimes composition is better than inheritance.

        --Dave