Quantcast
Channel: Andrew Lock | .NET Escapades
Viewing all articles
Browse latest Browse all 743

Hacking together an AsciiMath parser for .NET

$
0
0

In my previous post, I gave a brief introduction to MathML, the HTML elements for displaying mathematical equations and notation. At the end of that post I showed AsciiMath, an easy-to-write markup language which can be converted to MathML. You can think of AsciiMath a bit like markdown-for-MathML.

In this post I show how I put together an AsciiMath parser for .NET using the Jint JavaScript Interpreter for .NET, JavaScriptEngineSwitcher (which I've written about previously), and a re-implementation of the original ASCIIMathML.js implementation. I talk through the various approaches I considered, as well as my final approach.

AsciiMath for MathML

In my previous post I gave an introduction to the MathML specification, how it has evolved over the years (it's been around in various forms for over 25 years), and how MathML Core is the latest implementation used by modern browsers.

MathML lets you write code like this:

<math display="block">
  <mrow>
    <msubsup>
      <mo></mo>
      <mrow>
        <mo>-</mo>
        <mn>1</mn>
      </mrow>
      <mn>1</mn>
    </msubsup>
  </mrow>
  <msqrt>
    <mrow>
      <mn>1</mn>
      <mo>-</mo>
      <msup>
        <mi>x</mi>
        <mn>2</mn>
      </msup>
    </mrow>
  </msqrt>
  <mrow>
    <mi>d</mi>
    <mi>x</mi>
  </mrow>
  <mo>=</mo>
  <mfrac>
    <mi>π</mi>
    <mn>2</mn>
  </mfrac>
</math>

and it renders like this:

111x2dx=π2

Now, the obvious downside to the MathML specification is that it's very verbose. Writing out MathML by hand is clearly not a fun experience. Consequently people generally use a more compact representation, such as using LaTeX/TeX format, and then convert that to MathML separately. AsciiMath is a similar markup language, which aims to be even simpler than LaTeX/Tex, while remaining flexible. For example, the equation above could be written in AsciiMath as:

int_-1^1 sqrt(1-x^2)dx = pi/2

This is similar to how you might expect to write an equation in a programming language, or into a calculator program, so I'm quite a fan of it as a format.

If you're familiar with the LaTeX/TeX format for maths, then there's not much reason to switch to AsciiMath. But I suspect for the vast majority of people working in markdown, AsciiMath would be the more natural fit.

My blog is written using markdown, and then I have a .NET app that takes the files and converts them to static HTML. I wanted to use AsciiMath in my markdown files, but the big problem is that there doesn't appear to be an AsciiMath parser for .NET. Nevertheless, you can still parse AsciiMath in .NET if you're willing to make some compromises.

Evaluating the existing AsciiMath implementations

Looking around at AsciiMath parsers in general, I found basically four different implementations:

  • The "original" ASCIIMathML.js implementation.
    • This appears to be one of the first implementations, and is "canonical" in many ways.
    • However the JavaScript it's written in is very much ~2007 era, i.e. horrible 😅.
    • Generally intended to be "dropped onto a page" and rewrites the DOM.
    • Not actively developed.
    • Has fewer features (e.g. recognises fewer symbols) than some later implementations.
  • MathJax.
    • The canonical implementation for displaying math in the browser.
    • Uses its own display engine, can convert LaTeX, AsciiMath, or MathML into its proprietary format.
    • Modular and under active development.
    • Can be run in node or in the browser, but relatively complex.
  • The AsciiDoctor implementation of AsciiMath.
    • Written in Ruby.
    • Seems pretty complete.
  • ascii-math, a node.js version of the ASCIIMathML.js implementation.
    • Specifically intended to be used server-side
    • Based on the ASCIIMathML.js version, just replacing the DOM-manipulation methods with server-friendly shims.
    • Has the same limitations as the original in terms of feature support.

These can be basically be split into two different options

  • JavaScript implementations
  • A Ruby implementation

I've written previously about how you can run JavaScript code inside a .NET app so one avenue would be to do the same with one of the JavaScript implementations. However, my initial attempts at this had limited success, so I also considered the Ruby implementation.

Investigating running Ruby code in a .NET 8 app

Initially, running Ruby code inside a .NET app might seem strange. But back in the early days of .NET Framework there was a push to provide facilities for doing just this, and some of those efforts are still going today:

  • IronPython is an open-source implementation of the Python programming language which is tightly integrated with .NET. IronPython can use .NET and Python libraries, and other .NET languages can use Python code just as easily.
  • PeachPie is an open-source PHP language compiler and runtime for .NET.

Similarly, the IronRuby project was created around the same time as IronPython, with the goal of letting you interact with Ruby code from your .NET apps. Unfortunately, that project seems well and truly abandoned, with the last release in 2011.

Nevertheless, I gave the IronRuby NuGet package a try. I added the package, and wrote the following test code (based on a 10 year old StackOverflow question):

public string TestIronRuby(string input)
{
    string rubyCode = """
                    def my_function(name)
                       "Hello!"
                    end                   
                    """;
    ScriptEngine engine = Ruby.CreateEngine();            
    ScriptScope scope = engine.CreateScope();
    engine.Execute(rubyCode, scope);

    dynamic sayHelloFunction = scope.GetVariable("my_function");
    return sayHelloFunction(input).ToString();
}

This compiled ok, but at runtime it quickly blow up. The .NET 8 code would throw, complaining about missing assemblies/types (System.Configuration.ConfigurationManager) which are .NET Framework-specific, and generally aren't recommended or available in .NET Core/.NET 5+.

The problem was that IronRuby was written firmly in the .NET Framework era, before .NET Core was a glimmer in the .NET team's eye. Even if I could work around this issue, I was sure I'd run into multiple other problems (not least of which the question of gemfile dependency management), so I decided this avenue wasn't worth the hassle.

Bundling a server-side JavaScript AsciiMath renderer

With the Ruby approach abandoned, that left just three more options:

A quick browse of ASCIIMathML.js led me to discount it off the bat. The code is definitely…dated. It's also very specifically tied to the browser, as it directly creates DOM elements and manipulates them. In the constrained "node-like" environment where I would be running the JavaScript engine, it would require a lot of modifications to get it working, and the design of the code would generally make that unpleasant.

On the face of it, MathJax seems like the obvious candidate. It's one of the most actively developed and supported options, it has support for running in Node environments, and even has a bunch of examples available. Unfortunately, I still had difficulty getting this to work 🙁

I tried following the "AsciiMath string to MathML string" example, which shows code similar to the following (I've removed the comments for brevity and exported a convert function):

const {AsciiMath} = require('mathjax-full/js/input/asciimath.js');
const {HTMLDocument} = require('mathjax-full/js/handlers/html/HTMLDocument.js');
const {liteAdaptor} = require('mathjax-full/js/adaptors/liteAdaptor.js');
const {STATE} = require('mathjax-full/js/core/MathItem.js');
const {SerializedMmlVisitor} = require('mathjax-full/js/core/MmlTree/SerializedMmlVisitor.js');

const asciimath = new AsciiMath();
const html = new HTMLDocument('', liteAdaptor(), {InputJax: asciimath});
const visitor = new SerializedMmlVisitor();
const toMathML = (node => visitor.visitTree(node, html));

const convert = (ascii, inline => toMathML(
  html.convert(ascii || '',
  {display: !inline, end: STATE.CONVERT})));

export default convert;

In .NET-land I didn't want to have to copy all these resources into the project and figure out how to get the requires working etc, so instead I tried to bundle it all into a single file. Rather than wrestle with node or WebPack etc locally, I turned to bundle.js which I've used previously. It can grab all the assets from a CDN, bundle them all together, treeshake, and minify, all in the browser! It's worked well for me in the past in this sort of task, so I gave it a try.

I tried copying the above code into the "input" pane, and tweaked some settings: targeting es6 instead of esnext; changing the output format to iife for easier includes; and setting the platform to neutral instead of browser.

Using bundle.js to bundle mathjax

The resulting bundled code was 327 kB uncompressed 😮 Unfortunately, when I tested the resulting bundle using something like

const ascii = 'int_-1^1 sqrt(1-x^2)dx = pi/2';
const inline = true;
let result = BundledCode.default(ascii, inline);
console.log(result);

I would just get Uncaught ReferenceError: global is not defined errors. Maybe the error will be obvious to someone else, but I didn't really fancy digging through trying to find out what was going on. Rather than try to figure out the issue, I gave the alternative ascii-math a try instead.

Ascii-math is specifically designed as a self-contained node-module, so it was easy to add to bundle.js

Bundling ascii-math in bunudle.js

The bundle code defaults to:

export * from "ascii-math@2.0.0";
export { default } from "ascii-math@2.0.0";

and calling it (when it's compiled as an iife) is as simple as

const ascii = 'int_-1^1 sqrt(1-x^2)dx = pi/2';
let result = BundledCode.default(ascii).toString();
console.log(result);

Now we have a self-contained JavaScript file that can render the AsciiMath to MathML. So the next step is to make it callable from .NET.

Jamming an AsciiMath parser into .NET

To make the JavaScript code callable from .NET, I used the JavaScript interpreter for .NET (Jint) along with JavaScriptEngineSwitcher, as I described in a previous post.

I was previously using the Jurassic JavaScript engine, but I switched to Jint as part of my experiments. Jint has seen much more active development, and the performance issues I found when evaluating Jint originally appear to have all been resolved since the release of 3.0.0. Jint also has better support for ES6 (and beyond) than Jurassic.

The first step was to add the required Jint and JavaScriptEngineSwitcher package to the app:

dotnet add package JavaScriptEngineSwitcher.Jint --version 3.24.1

Next, I added the bundled ascii-math code to a file, ascii-math.js, and included it in the project as an embedded resource.

<ItemGroup>
  <None Remove="Parsing/AsciiMath/ascii-math.js" />
  <EmbeddedResource Include="Parsing/AsciiMath/ascii-math.js" />
</ItemGroup>

Loading the file into an instance of the Jint JavaScriptEngineSwitcher engine is simply:

var engine = new JintJsEngine();
engine.ExecuteResource("ascii-math.js", typeof(AsciiMathConverter));

To make it easier to call, and to allow specifying inline or block for the resulting <math> element, I also added a JavaScript utility function, parseMath, and then created a helper C# function Convert() for invoking this function. All put together, the converter looks like this:

using JavaScriptEngineSwitcher.Core;
using JavaScriptEngineSwitcher.Jint;

namespace BlogEngine.Parsing.AsciiMath;

public static class AsciiMathConverter
{
    private static readonly IJsEngine JsEngine = CreateEngine();

    private static IJsEngine CreateEngine()
    {
        var engine = new JintJsEngine();
        engine.ExecuteResource("ascii-math.js", typeof(AsciiMathConverter));
        engine.Execute("""
                       function parseMath(str, display) {
                           const node = BundledCode.default(str);
                           node.setAttribute("display", display);
                           return node.toString();
                       }
                       """);
        return engine;
    }

    public static string Convert(string asciiMath, bool displayInline)
    {
        JsEngine.SetVariableValue("input", asciiMath);
        JsEngine.SetVariableValue("display", displayInline ? "inline" : "block");

        JsEngine.Execute("converted = parseMath(input, display);");
        return JsEngine.Evaluate<string>("converted");
    }
}

To use the converter we can do:

string result = AsciiMathConverter.Convert("int_-1^1 sqrt(1-x^2)dx = pi/2", displayInline: false)
// <math title="int_-1^1 sqrt(1-x^2)dx = pi/2" display="block"><mrow><msubsup><mo>∫</mo><mrow><mo>-</mo><mn>1</mn></mrow><mn>1</mn></msubsup></mrow><msqrt><mrow><mn>1</mn><mo>-</mo><msup><mi>x</mi><mn>2</mn></msup></mrow></msqrt><mrow><mi>d</mi><mi>x</mi></mrow><mo>=</mo><mfrac><mi>π</mi><mn>2</mn></mfrac></math>

And there we have it, an AsciiMath parser for .NET (kind of)! This parser certainly has limitations: the AsciiMath implementation in ascii-math is less mature than MathJax or the AsciiDoctor implementation, and having to run a JavaScript interpreter is clearly a big limitation.

But on the hand, it does the job for me. Given I'm not trying to write lots of different equations, the somewhat limited implementation hasn't been a problem. Similarly, as the converter is only used when I'm building the static site, the overhead of spinning up and executing with a JavaScript interpreter is not a big deal. Obviously your mileage may vary in this respect!

Nevertheless, having to resort to using JavaScript here did bug me, so in the next post I'll provide a glimpse of the native .NET AsciiMath parser I've thrown together!

Summary

In this post I looked at some of the AsciiMath implementations available in various programming languages, and discussed my attempts to get them running in .NET. Finally, I showed my current chosen approach using the Jint JavaScript Interpreter for .NET, JavaScriptEngineSwitcher, and a server-based re-implementation of the original ASCIIMathML.js implementation. In the final code, the ascii-math.js file is loaded into the Jint engine, and is executed by passing the AsciiMath string and returning the equivalent MathML HTML elements as a string.


Viewing all articles
Browse latest Browse all 743

Trending Articles