http://qs1969.pair.com?node_id=510563


in reply to Embedding a mini-language for XML construction into Perl

To further reduce the syntax burden, we can eliminate many calls to the text constructor by letting element constructors accept an optional third argument for text content. In the common case, we no longer have need to call text. (Of course, should we want to call text for clarity, we still can.)

For example, the following fragment:

html { head { title { text "Title" } }; body { p { class_ "warning"; text "paragraph" } } };
can be simplified to this:
html { head { title {} "Title" }; body { p { class_ "warning" } "paragraph" } };

To effect the new syntax rules, we need only change the _elem and define_vocabulary functions from our original implementation. The changes are simple and marked with a hash-bang (#!):

sub _elem { my ($elem_name, $content_fn, $text) = @_; #! added $text arg # an element is represented by the triple [name, attrs, children] my $elem = [$elem_name, undef, undef]; do { local $__frag = $elem; $content_fn->(); text($text) if defined $text; #! new line }; push @{$__frag->[2]}, $elem; } sub define_vocabulary { my ($elems, $attrs) = @_; eval "sub $_(&@) { _elem('$_',\@_) }" for @$elems; #! proto eval "sub ${_}_(\$) { _attr('$_',\@_) }" for @$attrs; }

Can you spot any other syntax-reduction opportunities?

Cheers,
Tom

Replies are listed 'Best First'.
Re: Simplifying the syntax further
by Aristotle (Chancellor) on Nov 22, 2005 at 11:32 UTC

    That looks uglier and is less obvious. I’d prefer if it were possible to have the block’s value taken as its text content. This seemed tricky at first, because you don’t want to force users to put an explicit return;, undef;, ''; or whatever at the end of a block to avoid having the last expression of every block added as text content. After some reflection, however, it’s not tricky at all.

    There are only two cases: either you have a complex element with multiple children, be they sub-elements or full-on mixed content; or you have an element with nothing but text in it. These are clearly distinguishable: if the element has nothing but text in it, it won’t have any children yet when the block returns; if the element has complex content, it will already have explicitly constructed children when the block returns.

    The changes for this turn out even more trivial. Here’s the original _elem modified to meet this spec, with hashbangs:

    sub _elem { my ( $elem_name, $content_fn ) = @_; # an element is represented by the triple [name, attrs, children] my $elem = [ $elem_name, undef, undef ]; my $ret = { local $__frag = $elem; $content_fn->(); }; #! keep ret +val push @{ $elem[2] }, $ret if defined $ret and not @{ $elem[2] }; #! + new line push @{ $__frag->[2] }, $elem; }

    The only case where you get strange behaviour is when the block for an empty element contains code, something like br { ++$breaks } – this would now have to be written as br { ++$breaks; undef }. But you can now say

    html { head { title { "Lorem ipsum" } }; body { # ... }; }

    If you use text or if you construct any other element explicitly, the block’s return value will not interfere.

    Makeshifts last the longest.

      Good contribution! Your simplified text syntax is prettier and more intuitive.

      The corner case where an empty element contains code is a small blemish and an easy price to pay for the benefits of the simplified text syntax. Maybe we can even reduce the blemish by introducing another helper that declares a block to represent an empty content model:

      sub empty(&) { shift->(); undef }
      Then the corner-case becomes:
      doc { br { empty { ++$breaks } } }
      It's still not perfect, but maybe we can think of yet another improvement.

      Cheers,
      Tom

        Well, there's always:
        doc { br { ++$breaks; () } }
        which pretty much says "empty" to me. But maybe it's better to document it out front. I wonder if something resembling:
        doc { br {} (++$breaks) }
        could be made to work with an additional parameter that is ignored.

        That doesn’t look like a useful addition. The user still has to understand the edge case and know when to apply empty manually, but the cost of learning new syntax for it is not amortised, since it offers nothing over saying undef; explicitly.

        For the case of br and hr, the better solution would be to declare them as empty elements. (Then you could also warn/die if the user accidentally does call text or an element constructor.)

        Of course that still won’t help when the user wants the occasional empty td. I see no way to handle that without manual intervention, though, so there might as well be no magic for it at all. Magic must be transparent to earn the name; if it’s not, it just adds extra cost.

        Makeshifts last the longest.