Quantcast
Channel: Andrew Lock | .NET Escapades
Viewing all articles
Browse latest Browse all 743

Using the YamlDotNet source generator for Native AOT

$
0
0

In this post I show how you can use the YamlDotNet source generator in your .NET 7+ application. This is particularly important if you're planning on using your application with NativeAOT.

Reading YAML with YamlDotNet

Love it or hate it, YAML is everywhere these days. It's emerged as the markup language of choice for "cloud native" configuration, whether that's Kubernetes manifests, docker compose files, or GitHub Action workflows. On the one hand, it's a generally-easy-to-read format, and a strict superset of JSON. On the other hand, the significant whitespace can be a nightmare if you don't have decent tooling 😅

Either way, people clearly choose to use it, even if they don't have to. Years ago I wrote a small ASP.NET Core library, NetEscapades.Configuration.Yaml that reads YAML files in as part of ASP.NET Core's standard configuration system. Judging from the NuGet Trends data, YAML is not going away any time soon!

The NuGet downloads for the NetEscapades.Configuration.Yaml library

Under the hood, NetEscapades.Configuration.Yaml uses YamlDotNet to read the YAML files. YamlDotNet is the YAML parser for .NET—with ~245 Million downloads it's effectively the "Newtonsoft.Json for YAML".

There are several ways to work with YAML files with YamlDotNet. If you need to, you can manipulate a high-level representation of the YAML (e.g. YamlScalarNode, YamlMappingNode etc) if you need to work with the YAML document directly.

A more common approach is to serialize your YAML to and from strongly-typed objects, just like you'd do with Newtonsoft.Json or System.Text.Json. For example, the following code shows how to deserialize a YAML document into a strongly-typed Person object using YamlDotNet:

using YamlDotNet.Serialization;
using YamlDotNet.Serialization.NamingConventions;

// create a deserializer, using the builder pattern 
IDeserializer deserializer = new DeserializerBuilder()
    .WithNamingConvention(UnderscoredNamingConvention.Instance) // customise conventions
    .Build();

// Can deserialize from a string or TextReader (for example)
var yaml = 
    """
    name: George Washington
    age: 89
    height_in_inches: 5.75
    addresses:
      home:
        street: 400 Mockingbird Lane
        state: Hawidaho
    """;

Person p = deserializer.Deserialize<Person>(yaml);

You can customise the conventions used to read the YAML, override the serialization for certain types or properties, and generally customize things as you need. It's worth checking out the GitHub Wiki for details of all the possible customisation you can do.

The advantages and complexities of Native AOT

Native AOT is a new (since .NET 7) deployment mechanism for .NET applications. Normally, when you publish and deploy a .NET application, your application is compiled into Intermediate Language (IL). When the .NET runtime runs your application it uses a Just-In-Time (JIT) compiler to convert your IL into machine-language instructions which can actually be executed by processor.

In contrast, Native AOT performs the IL to machine-language conversion during a dotnet publish. It produces a single binary, targeting a single platform (for example x64 Windows or arm64 Linux), which contains the complete .NET runtime, all the base-class libraries, and your application. To keep the size of this file down, NativeAOT automatically "trims" any unused types and members from your application and from the underlying platform.

When choosing between IL+JIT or Native AOT there are a number of trade offs to consider. I won't go into those exhaustively here, and instead will just highlight a few.

Advantages to using the IL+JIT approach include:

  • The JIT can optimise the machine-language generated for the specific capabilities of the processor currently executing, which may mean it can produce faster code than Native AOT would.
  • You're free to use meta-programming approaches such as reflection (e.g. Assembly.LoadFile) and run-time code generation (e.g. System.Reflection.Emit).
  • You can use any .NET library available; they're all designed to be used in this mode.

Whereas Native AOT brings other advantages:

  • Native AOT typically allows significantly faster startup times, as there's no need to load all the types, start the JIT compiler, and generate machine code from IL; the app starts executing almost immediately.
  • The size of a Native AOT app is typically much smaller than the overall footprint of a JIT app (runtime + base class libraries + application), as any unused features are trimmed and removed.
  • Runtime memory usage is typically smaller, as the runtime has to do less work (it doesn't need to run the JIT compiler, load types, have debugger support).

Michal Strehovsky gave a great Deep dive on Native AOT talk at .NET Conf 2024; if you're interested in Native AOT I strongly suggest taking a look at it!

The big downside to using Native AOT publishing is that the compiler needs to be able to statically understand which types in your application are actually going to be used. That is particularly difficult for functionality that leans heavily on reflection. And guess what, serialization and deserialization typically does just that.

Source generation to the rescue!

Reflection is often problematic for Native AOT, as it can quickly become difficult for the compiler to know which types are actually being used. For example, the following code would (potentially) work fine when you're running with a JIT:

Console.WriteLine("Enter a type to load");
string typeToLoad = Console.ReadLine();
Type? type = Type.GetType(typeToLoad); // dynamically load the type
Console.WriteLine($"Loaded type {type}");

There's clearly no way for the compiler to know ahead of time what type will be requested, so there's pretty much no way Native AOT is going to work with this sort of pattern.

This example is obviously very contrived, but it's actually similar to how some plugin systems work in practice!

In other scenarios we use reflection primarily because it was historically the only tool available to us. This is often the case for serialization and deserialization. Serializers commonly use reflection to inspect the properties of objects so they can be created from a given document, whether that's XML, JSON, or YAML.

However, with the introduction of source generators, we now have another tool. Instead of inspecting types at runtime to generate the mapping code from documents-to-types, source generators allow us to move that work to compile time. This can give performance improvements (as there's less work to do at runtime), but more importantly (for this case) it also means our code is statically analyzable, and can potentially support Native AOT.

I've written a lot about source generators: I have a series on creating an incremental source generator here and have described some of the source generators I've created here and here.

It's important to be aware that using source generation for serialization typically requires making code changes. For example, using source generation with System.Text.Json (the built-in JSON serializer) requires

  • Creating a JsonSerializerContract.
  • Applying [JsonSerializable] with the types to generate.
  • Explicitly calling serialization method overloads that use the JsonSerializerContract.
string json = 
    """
    {
      "Date": "2019-08-01T00:00:00",
      "TemperatureCelsius": 25,
      "Summary": "Hot"
    }
    """;

var weatherForecast = JsonSerializer.Deserialize<WeatherForecast>(
    json,
    SourceGenerationContext.Default.WeatherForecast); // Explicitly use the context

// The data type
public class WeatherForecast
{
    public DateTime Date { get; set; }
    public int TemperatureCelsius { get; set; }
    public string? Summary { get; set; }
}

// The context - the body of this type is source generated
[JsonSerializable(typeof(WeatherForecast))]
internal partial class SourceGenerationContext : JsonSerializerContext
{
}

Source generation in System.Text.Json has got progressively better over recent releases, so that many of the gaps and issues that were once there have been resolved. However, I wasn't aware until recently that YamlDotNet has a similar source generator for serializing YAML.

Using source generation with YamlDotNet

I was recently working on a project that I wanted to publish using NativeAOT. It involved reading some Markdown files using Markdig, reading the YAML frontmatter, and parsing the YAML.

Initially, I thought I might have a problem. There's no real mention of Native AOT or source generation that I could see in the YamlDotNet Wiki. However this issue requesting support for Native AOT was marked complete and there was a sample called YamlDotNet.Core7AoTCompileTest, and sure enough, it was possible!

To use the YamlDotNet source generation in your project, you need to do 4 things:

  • Add a reference to Vecc.YamlDotNet.Analyzers.StaticGenerator.
  • Create a class derived from YamlDotNet.Serialization.StaticContext.
  • Annotate this class with the types you wish to use with source generation.
  • Use the StaticDeserializerBuilder instead of DeserializerBuilder to build your IDeserializer

I'll walk through each of those steps in the following sections.

1. Add a reference to Vecc.YamlDotNet.Analyzers.StaticGenerator

The first step, adding a reference to Vecc.YamlDotNet.Analyzers.StaticGenerator is a somewhat odd one - I get the impression that this is just a temporary measure by the current maintainer, but it's a required step right now. The NuGet is published by the current maintainer of YamlDotNet, and the version numbers match with the YamlDotNet releases.

Add both this package and YamlDotNet to your project:

dotnet add package Vecc.YamlDotNet.Analyzers.StaticGenerator
dotnet add package YamlDotNet

2. Create a YamlDotNet.Serialization.StaticContext type

Next, create a class that derives from YamlDotNet.Serialization.StaticContext, and add the [YamlStaticContext] attribute to it. For example:

using YamlDotNet.Serialization;

[YamlStaticContext]
public partial class YamlStaticContext : YamlDotNet.Serialization.StaticContext
{
}

This type is equivalent to the JsonSerializerContext used by System.Text.Json, and serves as the "target" of the source generator.

3. Add [YamlSerializable] attributes for each type to serialize

For each type that you want to serialize or deserialize, decorate your StaticContext class with a [YamlSerializable] attribute. For example:

using YamlDotNet.Serialization;

[YamlStaticContext]
[YamlSerializable(typeof(WeatherForecast))] // Generate for WeatherForecast type
public partial class YamlStaticContext : YamlDotNet.Serialization.StaticContext
{
}

This makes the type available for source generation.

Note that you must add [YamlSerializable] for all the non-built-in types that you wish to serialize, whether they're "top-level" types, or just referenced by other properties.

If you check the generated code, you can see exactly what YamlDotNet is doing. I'm not going to reproduce it all here, but there's a few interesting points to note:

  • By default, when you register a type T using [YamlSerializable], YamlDotNet will also recognise T[], IEnumerable<T>, List<T>, and Dictionary<string, T>.
  • The source generator implements an IObjectAccessor for the type.

On that latter point, the accessor looks something like this:

class DemoApp_WeatherForecast_379090c0bf12475d92847d8798d5c88f : YamlDotNet.Serialization.IObjectAccessor
{
    public void Set(string propertyName, object target, object value)
    {
        var v = (DemoApp.WeatherForecast)target;
        switch (propertyName)
        {
            case "Date": v.Date = (System.DateTime)value; return;
            case "TemperatureCelsius": v.TemperatureCelsius = (System.Int32)value; return;
            case "Summary": v.Summary = (System.String)value; return;
            default: throw new ArgumentOutOfRangeException("propertyName", $"{propertyName} does not exist or is not settable");
        }
    }
    public object Read(string propertyName, object target)
    {
        var v = (DemoApp.WeatherForecast)target;
        switch (propertyName)
        {
            case "Date": return v.Date;
            case "TemperatureCelsius": return v.TemperatureCelsius;
            case "Summary": return v.Summary;
        }
        return null;
    }
}

Of course, you don't need to worry about any of that. But if you do need to peek behind the curtains, you can often more easily see what's going on when the code is source generated like this!.

We're nearly finished, there's just one more change we need for our app to support the generated definitions.

4. Use the StaticDeserializerBuilder

The final step is to find where you're currently creating an IDeserializer using a DeserializerBuilder. Replace the DeserializerBuilder with StaticDeserializerBuilder and pass in an instance of your StaticContext. For example:

// Replace this ...
// IDeserializer deserializer = new DeserializerBuilder()

// With this:
//                              👇 Static builder         👇 Your StaticContext type
IDeserializer deserializer = new StaticDeserializerBuilder(new YamlStaticContext())
    .WithNamingConvention(UnderscoredNamingConvention.Instance)
    .Build();

WeatherForecast p = deserializer.Deserialize<WeatherForecast>(yaml);

You can still customize your conventions as before, but now when you call Deserialize<T> YamlDotNet uses your StaticContext and generated IObjectAccessors to deserialize the YAML into your strongly typed object. Serializing works in much the same way:

// Use the StaticSerializerBuilder and pass in your custom StaticContext
ISerializer serializer = new StaticSerializerBuilder(new YamlStaticContext())
    .EnsureRoundtrip()
    .Build();

var forecast = new WeatherForecast
{
    Date = DateTime.UtcNow,
    Summary = "Sunny",
    TemperatureCelsius = 23,
};

// Serialize the object as normal
string json = serializer.Serialize(forecast);

And that's it. You now have NativeAOT compatible YAML serialization and deserialization, all thanks to the good work of the YamlDotNet maintainers!

Summary

In this post I showed how you can use the YamlDotNet library to deserialize YAML into a strongly typed object. I then discussed some of the pros and cons of Native AOT, and how source generation can help work around the lack of reflection in Native AOT. Finally, I showed how to enable a source generator for YamlDotNet so that you can make your YAML serialization and deserialization Native AOT-friendly.


Viewing all articles
Browse latest Browse all 743

Trending Articles