Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

MathML 2 ascii?

by mikeyo (Novice)
on Jan 24, 2005 at 23:03 UTC ( #424731=perlquestion: print w/replies, xml ) Need Help??

mikeyo has asked for the wisdom of the Perl Monks concerning the following question:

I'm using XML::Twig to parse a large xml file of technical specifications. part of the specification is written in MathML:
<unit> <math> <apply> <divide/> <ci>joule</ci> <apply> <power/> <ci>meter</ci> <cn>2</cn> </apply> </apply> </math> </unit>
All I want from this spec is the text: J/m2 Does anyone know an easy way to do this?

Replies are listed 'Best First'.
Re: MathML 2 ascii?
by Aristotle (Chancellor) on Jan 25, 2005 at 03:46 UTC

    Use the right tool for the job. This here is nowhere near a full implementation of a ASCII renderer for MathML, but it's a start that will work for your one example document.

    <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl=" +ansform"> <xsl:output method="text" encoding="us-ascii"/> <xsl:strip-space elements="*" /> <xsl:template match="@*|node()"> <xsl:apply-templates select="@*|node()" /> </xsl:template> <xsl:template match="ci|cn"> <xsl:value-of select="." /> </xsl:template> <xsl:template match="divide"> / </xsl:template> <xsl:template match="power"> ^ </xsl:template> <xsl:template match="apply"> <xsl:variable name="operator"> <xsl:apply-templates select="*[1]" /> </xsl:variable> <xsl:text>(</xsl:text> <xsl:apply-templates select="*[2]" /> <xsl:for-each select="*[position() > 2]"> <xsl:copy-of select="$operator" /> <xsl:apply-templates select="." /> </xsl:for-each> <xsl:text>)</xsl:text> </xsl:template> </xsl:stylesheet>

    This translates your sample input to

    (joule / (meter ^ 2))

    Yes, XSLT is horribly verbose, but if you can see past the long-winded identifiers and the redundancy caused by closing tags, you'll find the basic modus operandi of the language to be a compact, elegant way of expressing tree manipulations. (I've often toyed with the idea of writing a strictly lexical translator that would allow transformations to be expressed in a less redundant, more human friendly syntax.)

    The following could be used as a driver script if you need to do this from Perl.

    #!/usr/bin/perl use strict; use warnings; use XML::LibXSLT; use XML::LibXML; my $parser = XML::LibXML->new(); my $mathml = $parser->parse_string( do { local $/; <> } ); my $xform_src = $parser->parse_fh( \*DATA ); my $xform = XML::LibXSLT->new()->parse_stylesheet( $xform_src ); my $results = $xform->transform( $mathml ); print $xform->output_string( $results ); __END__ paste stylesheet here...

    Makeshifts last the longest.

      This translates your sample input to (joule / (meter ^ 2)). I've often toyed with the idea of writing a strictly lexical translator that would allow transformations to be expressed in a less redundant, more human friendly syntax.

      Yes. Please do. I am of the firm opinion that any format that is entirely ASCII or Unicode should be human-readable, and just the right balance between terseness and verbosity to make it human-editable.

      Not every XML application is quite so bloated, but this one is a good example of what makes people reject XML. This is exactly the same situation where over-clever over-crammed Perl code turns people off of the whole Perl language.

      [ e d @ h a l l e y . c c ]

        I've often toyed with the idea of writing a strictly lexical translator that would allow transformations to be expressed in a less redundant, more human friendly syntax.
        Yes. Please do.

        Parsimonious XML Shorthand Language

        <q>PXSL ("pixel") is a convenient shorthand for writing markup-heavy XML documents.</q>

Re: MathML 2 ascii?
by tall_man (Parson) on Jan 25, 2005 at 00:53 UTC
    Here is a rough approach, done by recursing through the MathML expression tree, converting from prefix to infix notation.
    use strict; use XML::Twig; my %binop = qw(divide / power ^); my %down1 = qw(unit 1 math 1 apply 1); my %units = qw(joule J meter m); # Recurse through the ML tree, finding the binary operators. sub recurse { my ($e) = shift; my $tag = $e->gi; if (exists $binop{$tag}) { my $p1 = $e->next_sibling; my $p2 = $p1->next_sibling if defined($p1); recurse($p1) if defined($p1); print $binop{$tag}; recurse($p2) if defined($p2); } elsif (exists $down1{$tag}) { my $child = $e->first_child; recurse($child) if defined($child); } elsif ($tag eq "cn") { print $e->text; } elsif ($tag eq "ci") { my $txt = $e->text; if ($units{$txt}) { print $units{$txt}; } else { print $txt; } } else { print $tag; } } # Uncomment if you really don't want to see an infix operator for the +power. # $binop{power} = ""; my $t= XML::Twig->new(); my $txt = q{ <unit> <math> <apply> <divide/> <ci>joule</ci> <apply> <power/> <ci>meter</ci> <cn>2</cn> </apply> </apply> </math> </unit> }; $t->parse($txt); my $root= $t->root; recurse($root); print "\n";
Re: MathML 2 ascii?
by Frantz (Monk) on Jan 25, 2005 at 17:06 UTC
    Another solution:

    use XML::XPath

    The hard part is that you must write the xpath query which
    extract your datas ... and xpath is very hermetic.

    But the perl code may be very short :)

      I don't see how it could be. This isn't a case where you want to pull data from a document selectively, which XPath excels at. It's a case where you want to transcribe the entire document into a different format, where recursive processing of the full structure is inevitable.

      Makeshifts last the longest.

        The original post said that the MathML units code was just a small part of a much larger technical specification document, so something like XPath or XML::Twig will be required to extract it. I don't suppose you would want to write an XSLT translation specification for the entire document just to get the units.
Re: MathML 2 ascii?
by mirod (Canon) on Jan 27, 2005 at 08:49 UTC

    Below is how I would do it.

    It should be extensible to handle other units. It assumes apply is only used with 2 arguments, but this shouldn't be a problem, if it is used with more I would assume that it is for multiplication, which doesn't need any symbol to be applied.

    If I were you I would first create a version that would just output the unit -> ascii conversion, and run it on all documents, or on a set of documents. Something like removing the twig_print_outside_roots, instead of printing the content of unit store it in a hash $_->sprint => $_->text, and dump the content of that hash once you're done. This will give you all units in the document, so you can check the transformation. If you are not happy with the results and working at XML level is too much of a pain you can also add a final transformation there, with a simple hash initial_result => what_you_want.

    BTW are you really limited to pure ascii? Displaying J/m² could be a nice touch.

    #!/usr/bin/perl -w use strict; use XML::Twig; my %unit2symbol= ( joule => 'J', meter => 'm'); XML::Twig->new( twig_roots => { unit => sub { print $_->text } }, twig_handlers => { divide => sub { $_->set_text( '/') +}, ci => sub { $_->set_text( $unit +2symbol{$_->text}); }, apply => sub { $_->child( 0)->move +( after => $_->child( 1)); }, }, twig_print_outside_roots => 1, ) ->parse( \*DATA); __DATA__ <doc> <p>10 <unit> <math> <apply> <divide/> <ci>joule</ci> <apply> <power/> <ci>meter</ci> <cn>2</cn> </apply> </apply> </math> </unit></p> </doc>

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://424731]
Approved by johnnywang
Front-paged by broquaint
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2023-01-28 00:02 GMT
Find Nodes?
    Voting Booth?

    No recent polls found