OpenMath and MathML:
Semantic Mark Up for Mathematics
by O. Caprotti and D. Carlisle
Abstract
Unambiguous representation of mathematics is crucial for communications among humans or among computer systems. OpenMath is a standard aimed at supporting a semantically rich interchange of mathematics among varied computational software tools such as computer algebra systems, theorem provers, and tools for visualizing or editing mathematical text. MathML is a W3C Recommendation for the encoding of mathematics `on the web' which also includes mechanisms for encoding mathematical semantics. We introduce each of these two languages and describe their relationships.
Introduction
Mathematical communication must be clear and unambiguous. When mathematicians read a sentence such as
Consider the polynomial: ax4 + bx3 + cx2 + dx + e
they undoubtedly do not assume e = 2.71828 in such a context. Conversely in the expression
Find x such that eix = -1
then they probably do understand from the context that e is the base of the natural logarithms (and that x is pi!).
Computational systems lack the ability to use context to understand the semantics of a mathematical denotation. If we wish mathematics to be reliably communicated between such systems, we must mark up the document to provide extra semantic information.
Mathematics is often communicated at the notational level (think, for example, of the popularity of LaTeX). The semantics is not expressed by a formula but by text, in natural language. The reason for this is efficiency. Formal mathematics might require pages to convey the same theory that natural language can convey more concisely. While natural language works among humans, more formal languages are required if programs must understand the mathematics.
This article discusses two markup languages for mathematics designed to address these problems.
- The first, OpenMath, was originally conceived, and still `owned', by an Industry Consortium now known as the `OpenMath Society' [5]. The specification is currently being developed by a European `Esprit' Research Project [8]. This European project maintains close contacts with both the Society and with North American OpenMath Users [7].
- The second, MathML [4], is a Recommendation of the World Wide Web Consortium [5].
OpenMath
OpenMath is a language for representing and communicating mathematics [1] that tries to combine natural and formal language. Originally, it was conceived as a language for all computer algebra systems [2] and the semantics was mostly conveyed by natural language (in English actually). However, in its latest version [3], it is equipped for conveying expressions from all areas of mathematics, for instance logic. In this way, OpenMath can be used to express formal mathematical objects so that formal theorems and proofs, understandable to proof checkers, can be communicated as well as the usual mathematical expressions handled by CA systems.
OpenMath consists of several aspects. Those presented in this section are: the architecture of how OpenMath views integration of computational systems, the OpenMath Standard, and the OpenMath Phrasebooks and tools. The OpenMath Standard is concerned with the objects, their encodings, and the Content Dictionaries.
OpenMath Architecture
The OpenMath communication model is based on a three layer representation of a mathematical object: the private layer for the internal representation, the abstract layer for the representation as an OpenMath object, and the communication layer for translating the OpenMath object to a stream of bytes. An application dependent program manipulates the mathematical objects using its internal representation, it can convert them to OpenMath objects and communicate them by using the byte stream representation of OpenMath objects.
The private layer does not concern OpenMath.
In the abstract layer, OpenMath basic objects are integers, symbols, floating-point numbers, character strings, bytearrays, and variables. Compound objects are built using application, binding, error, and attribution. The meaning of the abstract OpenMath objects depends on Content Dictionaries (CDs). These are XML documents that contain the definitions of symbols occurring in the objects. CDs are public and are used to represent the actual common knowledge among OpenMath-compliant applications. A central idea to the OpenMath philosophy is that CDs fix the `meaning' of objects independently of the application.
The integration of OpenMath in an application is achieved by a Phrasebook, namely an interface program that converts an OpenMath object to/from the internal representation. The translation is governed by the CDs and the specifics of the application.
This abstract notion of an OpenMath object does have a formal grammar given in [3], and two OpenMath applications may directly communicate OpenMath objects via an internal representation if they share the same representation. However for this introduction we shall concentrate on the `third layer' of the OpenMath architecture, in which the OpenMath is linearised into a standard form that may be saved to a file, or transmitted to another application. Two standard encodings are defined, one is a compact binary encoding not discussed here, and the other uses the syntax of XML.
The OpenMath XML Encoding
The XML encoding of an OpenMath object is suitable for sending OpenMath objects via e-mail, news, cut-and-paste, etc. For instance, the encoding of application(sin, x), that abstractly represents the expression sin x, is:
<OMOBJ>
<OMA>
<OMS cd="transc" name="sin"/>
<OMV name="x"/>
</OMA>
</OMOBJ>
This encoding says that the symbol sin (tagged by OMS) is defined by the Content Dictionary transc. The elements OMA, and OMV identify, respectively, application and variables.
OpenMath Content Dictionaries
A Content Dictionary holds the meanings of (various) mathematical `words' referred to as symbols. In the previous release of the OpenMath standard, there was a single `core' CD named Basic. In the newest release, a set of official CDs, each covering a specific area has been produced and is available at a public repository site http://www.nag.co.uk/projects/omstd/cds-html.
A particular set of these Content Dictionaries, the `MathML CD Group', covers the same areas of mathematics as the Content elements of the W3C MathML Recommendation [4]. The XML DTD for CDs is given in the OpenMath Standard [3]. Work is under way to ensure that robust translations are possible between Content MathML and OpenMath object using these CDs.
A Content Dictionary consists of a header followed by a number of CD Definitions. The CD header contains information pertinent to the whole CD. This includes the name, a description, a date at which the CD will next be reviewed, the status of the CD (official, experimental, private, obsolete), and an optional list of CDs on which it depends. A CD Definition contains information restricted to a particular symbol definition. This includes a name, and a description in natural language. Optional information related to a symbol may contain a signature, examples of the use of this symbol, and properties satisfied by this symbol. Formal properties can be expressed again as an XML encoded OpenMath object, whereas `commented' properties are expressed in natural language. The XML DTD for CDs is given in the OpenMath Standard [3].
The CD transc of transcendental functions gives the following definition for `sin' that is used in the example above.
<CDDefinition>
<Name> sin </Name>
<Description>
The sin function as described in
Abramowitz and Stegun, section 4.3.
</Description>
</CDDefinition>
As you will see, the description of the semantics in this case is quite precise (a reference to a section of a standard text book) but not in a form that allows automatic processing of this definition. The intention is that this allows the implementer of a Phrasebook that implements OpenMath to map the OpenMath object <OMS cd="transc" name="sin"/> in the correct manner.
OpenMath Phrasebooks
The programs that act as interface between a software application and OpenMath are called Phrasebooks. Their task is to translate the OpenMath object, as understood using the Content Dictionaries, to the corresponding internal representation used by the specific software application.
MathML
Current HTML has no real support for Mathematics, often one must resort to GIF images. Using images results in poor quality printing and loss of all semantic information. This second point means there is no chance to search such web pages, or to `cut and paste' mathematical expressions from a web page into (say) a computer algebra package.
The World Wide Web Consortium (W3C) recently approved a new `standard' for an XML encoding of mathematics, MathML. Since W3C does not issue standards, this is known as the MathML Recommendation.
MathML is an XML application primarily intended for the use of `Mathematics on the Web'. Unlike the abstract model of OpenMath described above, MathML is in fact defined by its XML structure. However the requirements on a MathML document are stronger than just the requirements of being valid XML. The MathML Recommendation specifies extra constraints, such as the number and type of arguments certain functions take. These constraints will not be checked by a generic XML application, but will be checked by applications with specific support for MathML.
Already there exist several methods of displaying MathML within a Web browser. The principle ones currently are Techexplorer, [10], and WebEq [11].
MathML comes in essentially two distinct halves: presentation and content. Presentation MathML takes a `TeX-like' approach to Mathematics. It has access to a large array of mathematical symbols and visual constructs such a superscripts and alignments, but without too much emphasis on semantics.
In Presentation MathML, the `sin x' example would be:
<math>
<mrow>
<mi>sin</mi>
<mo>⁡</mo>
<mi>x</mi>
</mrow>
</math>
Content MathML is much closer in spirit to OpenMath. As mentioned above, work is underway to ensure that there is a direct mapping between Content MathML and OpenMath using a prescribed set of core Content Dictionaries.
In Content MathML one would express sinx as:
<math>
<apply>
</sin>
<ci>x</ci>
</apply>
</math>
Note that Content MathML has an essentially fixed mathematical range, the approximately 90 symbols specified (as empty elements) in the MathML DTD are designed to cover a range of mathematics roughly equating to the mathematics taught up to end of high school or first year at university. By contrast OpenMath is essentially extensible: new CDs may be produced providing symbols for new areas of mathematics.
OpenMath or MathML?
It may appear strange that two different encodings of Mathematics are being promoted simultaneously. However OpenMath and MathML are not in competition. There is a large overlap between the people responsible for the two standards.
OpenMath is aimed at encoding the semantics, and via its Content Dictionary mechanism may be applied to arbitrary areas of mathematics without any need for any central agreement to change that language. Of course, an OpenMath object that uses such a private CD will only be understandable and usable by an application that has a Phrasebook implementing the required semantics. What is always true is that the object may be written and stored using this CD even by applications that do not understand its semantics.
OpenMath on its own has no notion of a presentation form for the mathematics. In order to render the mathematics using a natural notation, one needs to convert OpenMath to some other form using dedicated `typesetting' Phrasebooks. Prototypes are already available that convert OpenMath to TeX or MathML. It is also feasible to use a Phrasebook for some other system such as Maple or Mathematica in order to utilize its method of rendering the equivalent mathematical object.
Conversely, Content MathML does have a default presentation form (although this may be overridden) and has a fixed set of mathematical operators. This fixed range and default presentation makes it more suitable for being embedded in Web browsers than OpenMath. The Mathematical range of MathML can be extended by `attaching' semantic meaning to expressions in Presentation MathML. This external semantics may be in any form, but one obvious contender is OpenMath as the language to define the semantics.
OpenMath Tools and Scenarios
Several `demonstration' scenarios are planned to show the usefulness of the OpenMath/MathML approach to encoding mathematics. The Esprit OpenMath Consortium is currently working to produce public example demonstrations in the following areas.
- A Multiple Integrator
Some mathematical problems are hard and computers can use more resources than allocated, and often the answer is not produced as fast as expected. One such problem is the computation of a closed form integral of a function. Being able to submit simultaneously to various systems for computational mathematics (such as AXIOM, Maple and Reduce) the same request increases the chances of obtaining a solution. A window manager can easily monitor which of these programs returns the answer first. Correctness of the solution is also easier to ascertain when more answers are checked against one another. In this scenario, OpenMath facilitates work because no rewriting of the representation of the function is needed before submission. The rewriting has become a task of the application Phrasebook. - Technical Conversations
Technical electronic conversations among two intelligent systems (human beings) can also benefit from OpenMath. OpenMath objects can be cut and pasted directly, without worrying about establishing the context for making sense. Writing, editing, and displaying of mathematical expressions in a user friendly interface are some of the activities that OpenMath tools for the working mathematician intend to support. - Mathematical Databases
Looking for occurrence of a word like `prime' in a database is a well worked out issue; good algorithms are available for fast delivery of all entries containing such a key word. However, lookup techniques for a mathematical function like log(i+x)-log(i-x) are far less advanced. The reason is that one would like also like to capture the entries of the database in which arctan(y) is stored. It all goes back to the fact that mathematics is not text, hence textual techniques are not so useful. For this example, at the very least one needs to express that i stands for the imaginary number the square root of minus one, that the argument x can be replaced by any other unused variable name. OpenMath easily deals with these issues.
Conclusion
We have presented OpenMath and MathML, two arising markup languages for the interactive use of mathematics on the World Wide Web. We have shown that they are in many ways complementary. OpenMath may use MathML to carry presentation information, MathML may use OpenMath as an extension mechanism. In particular, Content MathML may be seen as a `shorthand' for a core set of mathematics that allows both presentation and semantics to be expressed in a concise manner.
References
- 1
- John A. Abbott and André van Leeuwen and A. Strotmann OpenMath Communicating Mathematical Information between Co-operating Agents in a Knowledge Network. 1998 http://www.brunel.ac.uk/~hssrjis/issue/index8.html.
- 2
- S. Dalmas and M. Gaëtano and S. Watt An OpenMath 1.0 Implementation. 1997 Proceedings of ISSAC 97, ACM Press.
- 3
- The Esprit OpenMath Consortium The OpenMath Standard. 1999 http://www.nag.co.uk/projects/OpenMath.html.
- 4
- W3C Math Working Group, Stephen Buswell et al. Mathematical Markup Language (MathML) 1.0 Specification. 1998 http://www.w3.org/TR/REC-MathML/.
- 5
- World Wide Web Consortium Home Page http://www.w3c.org/.
- 6
- Openmath Society Home Page http://www.openmath.org/.
- 7
- North American OpenMath Initiative http://www.naomi.math.ca.
- 8
- Esprit OpenMath Consortium http://www.nag.co.uk/projects/OpenMath/.
- 9
- W3C Math Working Group Home Page http://www.w3.org/Math/.
- 10
- IBM Techexplorer http://www.software.ibm.com/techexplorer/.
- 11
- WebEQ http://www.webeq.com.
Biography
O. Caprotti and D. Carlisle
RIACA, Eindhoven University of Technology
NAG Ltd
olga@win.tue.nl
davidc@nag.co.uk
Both of the authors are employed on the Esprit OpenMath Project, [8]. The Second Author is also a member of the W3C Math Working Group, [9]. In this document the authors writing in a personal capacity and not on behalf of either the Esprit OpenMath Consortium or the W3C Math Group.