Quantcast
Viewing all articles
Browse latest Browse all 757

Exploring the generated code: List and fallback cases: Behind the scenes of collection expressions - Part 2

Image may be NSFW.
Clik here to view.

This series take an in-depth look at collection expressions, which were introduced with C#12. In the first post I provided an introduction to collection expressions, so if you're not familiar with how they work, I strongly suggest reading that one first.

In this post, we look at what the compiler generates when you use collection expressions with some of the built-in types. This post looks at many of the simple cases, where the generated code is easy to understand. In the next post we look at many more collections, where things get more interesting (and complicated!)

Note that by design the code produced by the compiler may change in future versions of .NET and C#. The generated code shown here represents a point-in time view of the situation. If new C# features, types, or mechanisms are introduced, the compiler can switch to using them behind-the-scenes, and your compiled code gets faster without you having to change anything!

When you use collection expressions in your code, the compiler has quite a lot of freedom to create the collection in any way it likes, so it tries to be efficient as possible. Depending on the details of the types, that's not always possible; this post looks predominantly at those cases.

Collection initializers: HashSet<T>, ConcurrentBag<T>, and SortedSet<T>

It might seem like an odd place to start, but I've started with HashSet<T> because the compiler-generated code is very simple and easy to understand. If you write code like this:

using System.Collections.Generic;

HashSet<int> hashset = [ 1, 2, 3, 4 ];

then the compiler generates code that looks a bit like this:

HashSet<int> hashSet = new HashSet<int>();
hashSet.Add(1);
hashSet.Add(2);
hashSet.Add(3);
hashSet.Add(4);

which is exactly the same code as if you'd used an old collection initializer instead of a collection expression:

HashSet<int> list = new() { 1, 2, 3, 4 };

Similarly, the HashSet<T> empty collection initializer [] simply calls new HashSet<T>().

So why is the code so basic and unoptimized here? The simple answer is that the compiler doesn't include any special-case handling for HashSet<T>, so it uses the same fallback path as custom collections: the collection initializer syntax.

The same is true for other collections that can be used with collection initializers, for example ConcurrentBag<T> and SortedSet<T>.

I hesitate to mention ConcurrentBag<T> at all, given Kevin's advice of "Never use ConcurrentBag<T> without benchmarking", but what can I say, I'm a sucker for completeness.

If you use a collection expression to create the collection:

using System.Collections.Concurrent;

ConcurrentBag<int> bag = [1, 2, 3, 4, 5];

then, as expected, the generated code is the same as the collection initializer version, calling Add() for each entry.

ConcurrentBag<int> bag = new ConcurrentBag<int>();
concurrentBag.Add(1);
concurrentBag.Add(2);
concurrentBag.Add(3);
concurrentBag.Add(4);
concurrentBag.Add(5);

And yes, as I'm sure you've guessed, the SortedSet<T> implementation is the same, with the generated code looking the same as the collection initializer version

SortedSet<int> sortedSet = new SortedSet<int>();
sortedSet.Add(1);
sortedSet.Add(2);
sortedSet.Add(3);

I promise we'll look at something more interesting than collection initializers soon, but before we do, it's worth showing that this doesn't just apply to built-in collections, you can use collection expressions with your own types too.

Using collection expressions with custom types

You may not be aware but you can use the collection initializer syntax with any type that implements IEnumerable and exposes an Add() method (or an equivalent extension method).

For example, you can create a collection like this:


class MyCollection : IEnumerable<int>
{
    private readonly List<int> _items = new();
    IEnumerator<int> IEnumerable<int>.GetEnumerator() => _items.GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator() => _items.GetEnumerator();
    public void Add(int i) { _items.Add(i); }
}

and then you can use it in a collection initializer like this:

MyCollection mycollection = new (){ 1, 2, 3, 4, 5 };

I implemented IEnumerable<T> in this case, but you could also implement IEnumerable, or simply expose a GetEnumerator() method without exposing an interface.

The generated code for a collection initializer simply calls Add() for each element:

MyCollection myCollection = new MyCollection();
myCollection.Add(1);
myCollection.Add(2);
myCollection.Add(3);
myCollection.Add(4);
myCollection.Add(5);

Collection expressions have essentially the same requirement for the type as collection initializers do—if you can use it in a collection initializer, you can likely use a collection expression:

MyCollection mycollection = [ 1, 2, 3, 4, 5 ];

And as you can probably guess, the generated code is exactly the same:

MyCollection myCollection = new MyCollection();
myCollection.Add(1);
myCollection.Add(2);
myCollection.Add(3);
myCollection.Add(4);
myCollection.Add(5);

One thing to note is that collection expressions require a public parameterless constructor, because the constructor is called implicitly in the generated code. That contrasts with collection initializers where you can have any public constructor, because you call it directly.

OK, we're finally done with the fallback path of collection initializers, it's time to look at something more interesting!

Optimizing List<T>

HashSet<T> wasn't very interesting, but with List<T> things start to get more complicated. If we first consider the old collection initializer syntax:

using System.Collections.Generic;

List<int> list = new () {1, 2, 3, 4, 5};

then the generated code would look very similar to the HashSet<T> code:

List<int> list = new List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(4);
list.Add(5);

This code is fine but it's doing more work than it needs to. Every call to Add() has to check whether the underlying int[] (which stores the actual values in the List<int>) needs to be resized. Even with just 5 elements, we end up needing to do a resize in the code above, because the default capacity is 4 elements.

So instead of collection initializers, we should use collection expressions:

using System.Collections.Generic;

List<int> list = [1, 2, 3, 4, 5];

With collection expressions, the compiler has more freedom to use the fact that it knows there's going to be 5 elements to create the underlying array with the correct size directly. That, coupled with some unsafe methods in CollectionsMarshal makes the initialization much more efficient:

List<int> list = new List<int>();
// Force the list to support the final number of entries
CollectionsMarshal.SetCount(list, 5);
// Get access to the underlying array as a Span<T> array, so you can mutate the values
Span<int> span = CollectionsMarshal.AsSpan(list);
int num = 0;
span[num] = 1; // Set each of the values
num++;
span[num] = 2;
num++;
span[num] = 3;
num++;
span[num] = 4;
num++;
span[num] = 5;
num++;

The generated code uses the CollectionsMarshal.SetCount() method that was introduced in .NET 8 to grow the list's underlying array to accommodate the final number of elements. It then uses the CollectionsMarshal.AsSpan() method that was introduced in .NET 5 to update the array elements directly.

Note that this method has some "unsafe" behaviour so you need to be careful with using it in general. The example shown above is perfectly safe when used by the compiler as part of collection expressions of course.

The generated code does much less work than calling Add() 5 times, but the end result is the same. This is one of the big selling points of collection expressions: the compiler can take advantage of updates to the language or runtime to make your code faster, without needing to change anything!

List<T> when targeting earlier versions

Collection expressions were introduced in C#12 with .NET 8, but you can also target earlier versions of .NET while still using C#12. I'm a bit hazy on what the official support for earlier TFMs looks like, given some C# features won't work in earlier versions of .NET (like default interface methods), but in general it's safe to assume you can use newer C# features like collection expressions unless the compiler tells you otherwise!

However, I mentioned that the optimised code for List<T> uses an API that was introduced in .NET 8: CollectionsMarshal.SetCount(). If you're targeting an earlier version of .NET, that API isn't available, and so the compiler has to do something else. In this case it falls back to the simple collection initializer code:

List<int> list = new List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(4);
list.Add(5);

So for this specific example, earlier TFMs don't benefit from performance improvements by using collection expressions, though that won't be true for all collection expression usages or for all collection types.

Interfaces backed by List<T>: IList<T> and ICollection<T>

The examples I've focused on so far have been concrete types, but collection expressions also work with some interface types. IList<T> and ICollection<T> in particular generate List<T> instances as the backing type.

So for this code:

using System;
using System.Collections.Generic;

IList<int> ilist = [1, 2, 3 ];
ICollection<int> collection = [2, 4, 6];

Console.WriteLine(ilist is List<int>); // True
Console.WriteLine(collection is List<int>); // True

Then you can see that the compiler generates the exact same List<T> initialization code you would expect if each variable was declared as a List<T>:

List<int> list = new List<int>();
CollectionsMarshal.SetCount(list, 3);
Span<int> span = CollectionsMarshal.AsSpan(list);
int num = 0;
span[num] = 1;
num++;
span[num] = 2;
num++;
span[num] = 3;
num++;

In the next post we'll look at more collection types like T[] and ReadOnlySpan<T> to see how they're heavily optimized when used with collection expressions.

Summary

In this post, I showed how collection expressions can always fallback to collection initializers for types that support them. I also showed how to create your own types to support collection initializers and expressions by implementing IEnumerable and adding an Add() method. Finally I showed how List<T> is optimized for collection expressions by using the .NET 8 API CollectionsMarshal.SetCount(), and how this falls back to collection expressions if you're targeting earlier framework versions. In the next post we'll at more collection types to see how they work with collection expressions.


Viewing all articles
Browse latest Browse all 757

Trending Articles