Quantcast
Channel: Andrew Lock | .NET Escapades
Viewing all articles
Browse latest Browse all 743

The minimal API AOT compilation template: Exploring the .NET 8 preview - Part 2

$
0
0

One of the big focuses of .NET 8 is Ahead of Time (AOT) compilation. In this post, I look at the new "AOT-ready" template shipping in the .NET 8 SDK preview releases, point out some of the interesting features, and demonstrate one of the main benefits of AOT - faster startup times.

As these posts are all using the preview builds, some of the features may change (or be removed) before .NET 8 finally ships in November 2023!

What is AOT compilation?

Ahead of Time (AOT) compilation is one of the main features being worked on by the ASP.NET team at Microsoft for .NET 8. To understand what it is, and why it's such a focus, we'll take a short detour to talk about how things typically work in .NET, and how things differ with AOT.

If you prefer videos, there was a great community standup recently, in which Damian Edwards and David Fowler talk about exactly this topic.

Traditionally, whichever language you use with .NET (C#, F#, VB.NET etc) you use a compiler to generate Intermediate Language (IL) byte code. With modern .NET you perform this step by running dotnet build on your project to produce an executable that contains the IL and a whole load of metadata.

If you're interested in what the IL looks like, you can experiment on the excellent https://sharplab.io. Here you can choose to output the IL for a snippet of C#, which can be useful if you're doing low-level or performance work.

As a simple example, for a function like the following:

public static void Main(string name)
    => Console.WriteLine("Hello " + name + "!");

the compiler generates IL that looks something like this:

IL_0000: ldstr "Hello "
IL_0005: ldarg.0
IL_0006: ldstr "!"
IL_000b: call string [System.Runtime]System.String::Concat(string, string, string)
IL_0010: call void [System.Console]System.Console::WriteLine(string)
IL_0015: ret

The details aren't important here, the main point is that the IL is still relatively high level. You can't take these instructions and run them directly on a CPU. For that, you need another stage of compilation, to turn the IL into assembly code. In .NET, this is typically done at runtime by the .NET runtime using a Just-in-time (JIT) compiler. The resulting instructions might look something like this:

L0000: push ebp
L0001: mov ebp, esp
L0003: push edi
L0004: push esi
L0005: push ebx
L0006: mov esi, ecx
L0008: test esi, esi
L000a: je short L0059
L000c: mov edi, [esi+4]
L000f: test edi, edi
L0011: je short L0059
L0013: lea ecx, [edi+7]
L0016: call System.String.FastAllocateString(Int32)
L001b: mov ebx, eax
L001d: push dword ptr [0x8b086e8]
L0023: mov ecx, ebx
L0025: xor edx, edx
L0027: call dword ptr [0x64b15a0]
L002d: push esi
L002e: mov ecx, ebx
L0030: mov edx, 6
L0035: call dword ptr [0x64b15a0]
L003b: lea edx, [edi+6]
L003e: push dword ptr [0x8ad5530]
L0044: mov ecx, ebx
L0046: call dword ptr [0x64b15a0]
L004c: mov ecx, ebx
L004e: call dword ptr [0x10b271e0]
L0054: pop ebx
L0055: pop esi
L0056: pop edi
L0057: pop ebp
L0058: ret
L0059: mov ecx, 7
L005e: call System.String.FastAllocateString(Int32)
L0063: mov ebx, eax
L0065: cmp dword ptr [ebx+4], 6
L0069: jl short L0096
L006b: lea ecx, [ebx+8]
L006e: mov edx, [0x8b086e8]
L0074: add edx, 8
L0077: push 0xc
L0079: call dword ptr [0x6a99fc0]
L007f: push dword ptr [0x8ad5530]
L0085: mov ecx, ebx
L0087: mov edx, 6
L008c: call dword ptr [0x64b15a0]
L0092: mov ecx, ebx
L0094: jmp short L004e
L0096: mov ecx, 0x9c595b4
L009b: call 0x05f0300c
L00a0: mov esi, eax
L00a2: mov ecx, esi
L00a4: call dword ptr [0x9c61af8]
L00aa: mov ecx, esi
L00ac: call 0x62fcef50
L00b1: int3

These are the instructions that the CPU actually runs. With AOT compilation you skip the intermediate IL step entirely and directly generate the assembly code that runs on the end CPU.

The obvious question is why wouldn't you want to use AOT? What are the pros and cons of AOT vs. JIT?

The pros and cons of AOT compilation

AOT compilation generates assembly code instructions, so there's no JIT compilation at runtime when the program executes. This gives one major advantage—it significantly reduces startup time.

There are other potential benefits—AOT can reduce the overall disk footprint (the total size of all files) as well as the memory consumption—but I'm going to focus on the reduction in startup time here.

Normally, before the .NET runtime executes a method, it runs the JIT compiler against the method's IL to generate the assembly code to execute. When the app starts up, there are typically a lot of classes and methods that need to be JIT compiled. This all adds up, and generally means that .NET apps can take a while to start up.

In traditional server applications that are only started once and stay running for hours or days, taking a long time to start doesn't really matter. But the startup time is important for applications that use AWS Lambda or Azure Functions. These apps spin up in response to a request, execute once, and then exit. For these apps, which may charge by the millisecond, startup time is crucial.

This specific scenario is where AOT compilation for .NET would really shine. That's why it's a focus for .NET 8. But AOT isn't universally "better" than using JIT compilation. There are a lot of downsides, which means AOT often isn't the right choice; the JIT approach may be better for your use case.

Making a .NET program suitable for AOT is not trivial, and as such, only minimal API apps and gRPC apps are expected to be compatible with AOT in .NET 8.

One of the main problems with AOT is that the machine code is significantly larger than IL (as you can see from the previous section: 15 IL instructions and arguments results in 177 assembly code instructions). So if you naively just performed AOT on an ASP.NET Core app, including all of the framework, the resulting file size would be absolutely huge.

The only way to create something manageable is to perform "app trimming" (also called tree-shaking in some other languages). This involves removing all the parts of both your application and the framework that aren't actually used in your app. This can massively reduce the size of the resulting binaries, and makes AOT practical.

You can trim self-contained applications without AOT, but the reverse is not true; for all practical purposes, AOT requires trimming.

Trimming is a prerequisite for AOT, but it's also where the difficulties start. For trimming to work correctly, the compiler must be able to work out exactly which classes, fields, and methods are actually used in your application. Everything else must be removed.

The trouble is that .NET has reflection and dynamic dispatch, which means that in general, it's fundamentally not possible to statically analyse all .NET applications completely. For a trivial example, imagine a console application that takes the name of a class as input, and creates an instance of that type. You can't know ahead of time what type the user will request, so you can't ensure that the type is actually retained in the final program.

Another example is "plugin" applications that dynamically load dlls. There's no way to know ahead of time which types these assemblies would require.

Plugin-style applications actually have an even bigger problem. With AOT there's no JIT, so there's no way to even compile the IL contained in any dlls you would try to load!

Trimming is the fundamental reason why making .NET apps AOT-compatible is hard. The big effort in .NET 8 in supporting AOT is mostly about making the framework components statically analysable by the compiler. Given the design of ASP.NET Core, that's not an easy prospect at all!

Another limitation is that AOT generates assembly code for a specific platform such as Windows x64 or Linux arm64. The generated code can only run on the specified platform, unlike IL, which is generally cross-platform by design.

One point that's sometimes overlooked is that the JIT compiler knows more about the machine the code is running on (because it's running on it!) than the AOT compiler. That means that the JIT compiler can potentially generate more optimised code than the AOT compiler. For example, the JIT compiler knows which hardware intrinsics are available, whereas the AOT compiler can't make the same assumptions. That means that in some circumstances, the steady-state performance of a JIT app may outperform an AOT app.

That's all the theory, now it's time to take a look at the new template and see the impact of AOT!

Exploring the template

The .NET previews include a new template, dotnet new api, which creates a new minimal API application. The template lies somewhere between the .NET 7 webapi and empty templates. The template is a basic to-do list application with API endpoints for fetching auto-generated to-do models.

For our purposes, the really interesting point is the inclusion of an --aot option. You can generate the template using

dotnet new api --aot

The --aot option adds two things to the generated code:

  • It configures the .NET 6 JSON source generator and adds a JsonSerializerContext implementation.
  • It sets the MSBuild property to PublishAot=true in the .csproj file.

So that's enough intro, now let's take a look at the code! The Program.cs file is shown below, I've highlighted a few interesting points:

using System.Text.Json.Serialization;

// 👇 Note Slim builder - new in .NET 8
var builder = WebApplication.CreateSlimBuilder(args); 

// 👇 Only added with --aot, configures the JSON source generator
builder.Services.ConfigureHttpJsonOptions(options =>
{
    options.SerializerOptions.TypeInfoResolverChain.Insert(0, AppJsonSerializerContext.Default);
});

var app = builder.Build();

// 👇 Generates an array of `Todo` objects
var sampleTodos = TodoGenerator.GenerateTodos().ToArray();

// 👇 The actual API configuration
var todosApi = app.MapGroup("/todos");
todosApi.MapGet("/", () => sampleTodos);
todosApi.MapGet("/{id}", (int id) =>
    sampleTodos.FirstOrDefault(a => a.Id == id) is { } todo
        ? Results.Ok(todo)
        : Results.NotFound());

app.Run();

// 👇 The serialization context required for source generation
[JsonSerializable(typeof(Todo[]))]
internal partial class AppJsonSerializerContext : JsonSerializerContext
{

}

There's a couple of interesting points here:

  • It's using the new CreateSlimBuilder() method—more on that in a later post!
  • It configures the app to use the JSON source generator (required for AOT)
  • Uses a somewhat excessive (in my opinion) generator to create an array of Todo objects (not shown in this post)

I know that getting the templates right is always a difficult one, but the 30 lines of generator code (more than 30% of all the code in the template) seems a bit over the top when a static array would be just as useful in my opinion 😉 Especially as that's code that's going to be immediately deleted in non-demo scenarios.

There's the usual appsettings.json files and such as you would expect. The .csproj file is relatively standard, but it configures several properties:

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
    <!-- 👇 Disables server GC to reduce memory consumption -->
    <ServerGarbageCollection>false</ServerGarbageCollection>
    <!-- 👇 Using invariant globalization reduces app sizes -->
    <InvariantGlobalization>true</InvariantGlobalization>
    <!-- 👇 Enables always publishing as AOT -->
    <PublishAot>true</PublishAot>
  </PropertyGroup>
</Project>

So that's the template, time to take it for a spin!

Testing the template

To publish the app using AOT, run

dotnet publish

Thanks to the PublishAot setting in the project file, the app is automatically published using the AOT compile chain.

Note that for AOT, you need to make sure to install the "Desktop development with C++" workload from Visual Studio 2022 (or the prerequisites described here). Without it you may see errors like error : Platform linker not found or fatal error LNK1181: cannot open input file 'advapi32.lib'. For the latter point, I made sure to install the component Windows 10 SDK (10.0.19041.0) as that matches my current machine.

Once you've published the app you can run it directly, and it works just like any other ASP.NET Core minimal API app! But for this post, I'm most interested in comparing the startup time with and without AOT.

Measuring startup time

To measure the startup time I decided to make a simple tweak. Instead of calling app.Run() to run and block execution, I changed it to the following:

var task = app.RunAsync();

await app.StopAsync();

This starts the app running, and then immediately stops it. To measure the total time it takes to start and stop the app, I used a tool written by one of my colleagues: TimeItSharp, a port of the timeit task. He wrote it for a similar purpose, to measure the duration time of an application as part of our work on the Datadog .NET tracer.

To install the global .NET tool, run:

dotnet tool install --global TimeItSharp

TimeItSharp relies on a config file to configure how many times to run the app as a warm up, and how many times to run the app. I added a simple JSON file to the app called timeit.json that looks like this

{
  "warmUpCount": 10,
  "count": 100,
  "scenarios": [{"name": "Default"}],
  "processName": "aottest.exe",
  "workingDirectory": "$(CWD)/",
  "processTimeout": 15
}

I then published the app:

> dotnet publish
MSBuild version 17.7.0-preview-23281-03+4ce2ff1f8 for .NET
  Determining projects to restore...
  All projects are up-to-date for restore.
  aottest -> C:\aottest\bin\Release\net8.0\win-x64\aottest.dll
  aottest -> C:\aottest\bin\Release\net8.0\win-x64\publish\

and ran the test!

cd C:\aottest\bin\Release\net8.0\win-x64\publish\
dotnet timeit timeit.json

So lets look at the results!

Comparing the results

TimeItSharp can generate a whole load of metrics, but for simplicity, I'll show a reduced version of the results. First of all, let's see the results without AOT:

C:\aottest\bin\Release\net8.0\publish> dotnet timeit timeit.json
TimeIt (v. 0.0.8.0) by Tony Redondo

Warmup count: 10
Count: 100
Number of Scenarios: 1
Exporters: ConsoleExporter, JsonExporter, Datadog

Scenario: Default
  Warming up ..........
    Duration: 7.4238972s
  Run ....................................................................................................
    Duration: 37.0493903s

The average time to execute the non-AOT app—to build the WebApplication, start the app, and shut down—was 350ms.

NameMeanStdDevStdErrMinMaxP95P90Outliers
Default363.9ms6.268ms0.6299ms350.5ms382.9ms375.9ms373.6ms1

Bear in mind these numbers are all relative, as I was running this on a relatively old Windows laptop, you will obviously get different numbers on a different machine!

If we run the same thing with AOT enabled, the whole process takes only 45ms!

NameMeanStdDevStdErrMinMaxP95P90Outliers
Default44.75ms3.755ms0.3773ms39.98ms58.00ms52.58ms50.27ms1

That's over 7x speed up in the time to execute the same app! And that's the power of AOT 😃

As I've described throughout this post, AOT won't solve all your problems, and it may cause significant new ones, but it most definitely will improve your .NET app startup times!

Summary

In this post, I described the difference between AOT and JIT compilation in .NET apps, and looked at some of the pros and cons of AOT compilation. I then looked at the new AOT-compatible api template included in the .NET 8 previews. Comparing the startup times for the non-AOT and AOT versions of the app, the benefits of AOT became clear: without AOT the time to startup and shutdown averaged 350ms; with AOT, this dropped to just 45ms! This is where AOT really shines: reducing the startup time for apps.


Viewing all articles
Browse latest Browse all 743

Trending Articles