Quantcast
Channel: Andrew Lock | .NET Escapades
Viewing all articles
Browse latest Browse all 743

Exploring the generated code: the spread element: Behind the scenes of collection expressions - Part 4

$
0
0

This series take an in-depth look at collection expressions, which were introduced with C#12. In the first post I provided an introduction to collection expressions, and in the second and third posts we looked at the code generated when you use collection expressions to create List<T>, T[], and Span<T> collections, among others.

In each of the previous posts we only looked at simple cases, where you're creating a collection directly from a fixed set of elements, for example:

List<string> list = [ "1", "2", "3", "4", "5" ];
int[] array = [ 1, 2, 3, 4, 5 ];
Span<int> span =  = [ 1, 2, 3, 4, 5 ];

In this post we look at what the compiler generates when you use collection expressions with the spread element and how it changes based on the source and destination collections.

Note that by design the code produced by the compiler may change in future versions of .NET and C#. The generated code shown here represents a point-in time view of the situation. If new C# features, types, or mechanisms are introduced, the compiler can switch to using them behind-the-scenes, and your compiled code gets faster without you having to change anything! In fact, changes in the compiler meant by the time I finished this series, the first examples were already out of date!

We'll start with a quick recap on the spread element, and then we'll look at how the compiler generates code for collection expressions that use spreading. As always, you don't need to know what this code looks like, and it might change in the future. This whole series is mostly an attempt to satisfy my own curiosity!

Creating collections using the spread element

The spread element .. was introduced with collection expressions and lets you combine collections together into new collections.

For example the following function creates a List<int> that contains all the same elements as the int[]:

int[] array = [ 1, 2, 3, 4, 5];
List<int> list = [ ..array ]; // list contains 1, 2, 3, 4, 5

You can use the spread element to combine multiple collections into one, and mix and match with fixed values:

List<int> start = new () { 1, 2, 3 }; // The source list can be any IEnumerable collection
IEnumerable<int> end = [ 5, 6, 7];    // regardless of how it was created

// You can combine single elements and spread elements however you
// like in the collection expression
int[] all = [ 0, ..start, 4 ..end, 8 ]; // 0, 1, 2, 3, 4, 5, 6, 7, 8

That's pretty much all there is to the spread element in terms of syntax. It's pretty simple, but combining collections is such a common requirement, and the previous syntax was so cumbersome that this is an incredibly valuable feature.

And as an added bonus, it's declarative about what you want the final collection should look like, as opposed to imperatively describing which methods to call to build the collection. That means the compiler is free to do whatever it thinks is best to optimise the code, as we'll explore in this post.

Collection expressions and the spread element work with almost any collection type, so in this post I've broken down the sections based on the destination type, and then looked at a variety of different source collections. We'll start with List<T>, as one of the most common collection types in .NET.

Creating List<T> using spread elements

List<T> is one of the most common collection types, and is a good general purpose choice. You can use List<T> to add an arbitrary number of elements to it, but if you know how many elements you need, you can improve performance, as we saw in a previous post.

Creating List<T> from T[]

We'll start with the example where the source collection is a T[] and we're creating a List<T> by spreading all the elements into it.

I've used int for simplicity in all the collections in this post. Unlike when constructing the original collection expressions, the element type has less of an effect when you're spreading collections into one another.

We can write a simple program to demonstrate the spread element in action:

using System.Collections.Generic;

MyFunc([ 1, 2, 3, 4, 5]);

List<int> MyFunc(int[] source) => [..source]; // T[] source

I've separated the "source collection" creation and "destination collection" creation into separate methods to make it easier to follow the sharplab.io generated code, to make it clear that only the code inside the MyFunc() method is related to the spread element.

To make the generated code easier to follow in this post, I'm only showing the literal "spreading" code where the return List<T> is constructed in the MyFunc() method. The following annotated code is based on the generated code in sharplab.io. It shows that when the source array is a T[], the list can be constructed quite efficiently:

// Create a new List<T>, which will be the final returned value.
List<int> list = new List<int>();

// Force the destination Count to match the source length
CollectionsMarshal.SetCount(list, source.Length);

// Retrieve the T[] backing field in List<T> as a Span<T>
Span<int> span = CollectionsMarshal.AsSpan(list);
// 👆 Everything up to here was all "normal" for creating a list with a known, fixed length.

int num = 0

// Wrap the _source_ array as a ReadOnlySpan<T>
ReadOnlySpan<int> readOnlySpan = new ReadOnlySpan<int>(source);

// Copy the source Span to the destination Span
readOnlySpan.CopyTo(span.Slice(num, readOnlySpan.Length)); // The slice isn't actually necessary here 
num += readOnlySpan.Length; // not necessary, an artifact left over, we'll come to it later

As you can see, the List<T> is efficiently constructed, using SetCount to set the size of the underlying array, and retrieving the backing field as a Span<T>. The source T[] is then wrapped in a Span<T> as well, and copied directly into the destination list. And that's it!

You get roughly the same code for all the following source types:

  • T[]—The code shown above.
  • Span<T>—Identical, except the source is already a Span<T> so no need to wrap it.
  • ReadOnlySpan<T>—As for Span<T>, the source can be directly copied to the destination
  • List<T>—If the source is a List<T>, the backing array is directly accessed using CollectionsMarshal.AsSpan() (if available) and then copied to the destination span.

These examples all behave roughly the same, because we can get a Span<T> of the source data. But what if we can't, and we don't even know how many elements are in the collection?

Creating List<T> from IEnumerable<T>

Spreading Span<T>, List<T>, or T[] are pretty much a best case for collection expressions. In contrast, the worst case is IEnumerable<T>, where you have no idea how many elements are in the source collection:

using System.Collections.Generic;

MyFunc([ 1, 2, 3, 4, 5]);

List<int> MyFunc(IEnumerable<int> source) => [..source]; // IEnumerable<T> source

How does the generated collection expression spread code handle this situation?

List<int> list = new List<int>();
list.AddRange(source);

In this case, the best the compiler can do is fallback on AddRange(). There's no optimisation here, because we can't really do any—we don't know how many elements the IEnumerable<T> contains, so we can't optimise the initial list capacity to avoid resizes.

Incidentally, if you're targeting earlier runtimes prior to the introduction of CollectionsMarshal.SetCount(), the generated code also uses AddRange() for T[] etc, but the list capacity is pre-set to avoid resizes, e.g.

List<int> list = new List<int>(source.Length); // setting the capacity if it's known
list.AddRange(source);

Creating List<T> from ICollection<T> and friends

The final scenario we'll look at is when the source is ICollection<T> or a similar interface:

using System.Collections.Generic;

MyFunc([ 1, 2, 3, 4, 5]);

List<int> MyFunc(ICollection<int> source) => [..source]; // ICollection<T> source

The generated code in this case looks relatively complicated, but you can see from this example that it's the same generated code as a foreach loop, so it's equivalent to something like this:

// Create the destination list
List<int> list = new List<int>();

// Initialize the backing array to the size of the collection
CollectionsMarshal.SetCount(list, source.Count);

// Get the underlying array
Span<int> span = CollectionsMarshal.AsSpan(list);

// Set each element in the source in the destination span
int num = 0;
foreach(var current in source)
{
    span[num] = current;
    num++;
}

You might wonder why the generated code doesn't just use AddRange() again (and for earlier runtimes, that's exactly what happens), but the generated code that writes directly to the Span<T> is (presumably) more efficient, as it bypasses the overhead of Add() and AddRange().

Multiple spread and fixed elements

For simplicity I'm mostly showing single collection sources in this post, but you can also mix spread collections with fixed elements, for example:

using System;
using System.Collections.Generic;

MyFunc([ 1, 2, 3, 4, 5]);

List<int> MyFunc(Span<int> source) => [0, ..source, 6, 7];

In this example, the additional elements don't really change anything, as the compiler can account for the additional elements and calculate the final required capacity:

int num = 0;
Span<int> span = source;

int num2 = 3 + span.Length; // Calculate the total length

List<int> list = new List<int>(num2); // Create the final list
CollectionsMarshal.SetCount(list, num2); // set the final size

Span<int> span2 = CollectionsMarshal.AsSpan(list); // Grab the list backing-array

int index = 0;
span2[index] = num; // Set the first fixed element
index++;
span.CopyTo(span2.Slice(index, span.Length)); // Copy the source to the destination
index += span.Length;
span2[index] = 6; // Copy the remaining fixed elements
index++;
span2[index] = 7;
index++;

The code changes for other collection types in a similar way, so I won't repeat things here. Where things get interesting is when there are multiple spread elements. For example the following example spreads two ICollection<T> intance:

using System;
using System.Collections.Generic;

MyFunc([ 1, 2, 3, 4, 5], [ 1, 2, 3, 4, 5]);

List<int> MyFunc(ICollection<int> source, ICollection<int> source2)
    => [0, ..source, 6, 7, ..source2];

Given the final required size of the list is known, I expected the generated code would take that into account, but it uses much simpler code:

List<int> list = new List<int>();
list.Add(0);
list.AddRange(source);
list.Add(6);
list.Add(7);
list.AddRange(source2);

I suspect this is a case of not bothering to optimise the less-common cases, so maybe it's something that will be improved later on?

I think we've looked enough at creating List<T>, so now we'll look at the other end of the spectrum, creating an IEnumerable<T> from other collections.

Creating IEnumerable<T> using spread elements

Creating an IEnumerable<T> from multiple existing collections is again, possible to do without using collection expressions, but it's either not very efficient, or very clunky.

For example, imagine you're trying to combine two collections with some fixed elements, something like the previous example:

using System;
using System.Collections.Generic;

IEnumerable<int> MyFunc(IEnumerable<int> source, IEnumerable<int> source2)
    => [0, ..source, 6, 7, ..source2];

Without collection expressions you could use the yield keyword to produce the same result:

IEnumerable<int> B(IEnumerable<int> source, IEnumerable<int> source2)
{
    yield return 0;
    foreach(var val in source)
    {
        yield return val;
    }
    
    yield return 6;
    yield return 7;
    foreach(var val in source2)
    {
        yield return val;
    }
}

But that requires that you do the concatenation in a separate function and is very verbose. Alternatively you could fallback on using Linq:

IEnumerable<int> B(IEnumerable<int> source, IEnumerable<int> source2)
    => Enumerable.Repeat(0, 1)
        .Concat(source)
        .Concat(Enumerable.Repeat(6, 1))
        .Concat(Enumerable.Repeat(7, 1))
        .Concat(source2);

This is almost nice, but adding the fixed values is ugly and relatively inefficient. Collection expressions are just so much nicer here! But what does the compiler actually generate?

Creating IEnumerable<T> From a List<T>, Span<T>, and T[]

We'll go back to single collections for simplicity here, and we'll start with a List<T> source:

using System.Collections.Generic;

MyFunc([ 1, 2, 3, 4, 5]);

IEnumerable<int> MyFunc(List<int> source) => [..source];

The generated code in this case is incredibly simple, it calls List<T>.ToArray() to get a copy of the backing array, and then wraps the result in the generated ReadOnlyArray type (which I discussed in my previous post)

new <>z__ReadOnlyArray<int>(source.ToArray());

Interestingly, replacing the List<T> with Span<T> or ReadOnlySpan<T> generates exactly the same code, as they also provide a ToArray() method. And T[] is almost the same:

new <>z__ReadOnlyArray<int>(new ReadOnlySpan<int>(source).ToArray());

The only difference here is that the generated code wraps the T[] in a ReadOnlySpan<T> first before calling ToArray(). Interestingly this is presumably either cheaper to perform than a direct Array.Copy() or is just a nicer API to use, I haven't looked into which is the answer!

Creating IEnumerable<T> from another IEnumerable<T>

Moving away from known-length types like List<T> and T[] we can move to the other end of the spectrum, creating IEnumerable<T> from another IEnumerable<T>:

using System.Collections.Generic;

MyFunc([ 1, 2, 3, 4, 5]);

IEnumerable<int> MyFunc(IEnumerable<int> source) => [..source];

In this case the generated code uses a List<T> as the "backing" type for the returned object, adding the source to the list using List<T>.AddRange(), and then wrapping that in another generated type <>z__ReadOnlyList:

List<int> list = new List<int>();
list.AddRange(source);
new <>z__ReadOnlyList<int>(list);

Just like <>z__ReadOnlyArray, the <>z__ReadOnlyList type is a compiler-generated type. It looks something like the following, with all the interfaces implemented explicitly:

internal sealed class <>z__ReadOnlyList<T> : IEnumerable, ICollection, IList, IEnumerable<T>, IReadOnlyCollection<T>, IReadOnlyList<T>, ICollection<T>, IList<T>
{
    private readonly List<T> _items; // The backing list containing the items

    // The interfaces are explicitly implemented and delegate to the backing list
    int ICollection.Count => _items.Count; 
    void IList.Clear() => throw new NotSupportedException(); // Mutation methods throw
    
    // ... etc
}

All of the members that are implemented delegate to the underlying List<T> _items, and all of the members that would mutate the list throw a NotSupportedException().

Creating IEnumerable<T> from another ICollection<T> and similar

The final case we'll look at is creating an IEnumerable<T> from ICollection<T> and other similar interfaces:

using System.Collections.Generic;

MyFunc([ 1, 2, 3, 4, 5]);

IEnumerable<int> MyFunc(ICollection<int> source) => [..source];

As always, this case is somewhat between the int[] and IEnumerable<T> cases; we're not guaranteed to have contiguous memory, but we do know the size of the collection, so we can pre-allocate a collection with the correct size.

int num = 0;
int[] array = new int[source.Count];
foreach(var current in source)
{
    array[num] = current;
    num++;
}
return new <>z__ReadOnlyArray<int>(array);

In this case the compiler creates a T[] of the correct final size. It then enumerates all of the elements in the ICollection<T> and assigns them to the array elements. Finally, it wraps the array in the compiler-generated <>z__ReadOnlyArray type.

Creating T[] using spread elements

This post is getting very long, so we're going to pick up the pace now! 😄

Creating a T[] from a spread collection is one of the easiest options. As we've seen repeatedly, a T[] is often the chosen "backing" collection for types where possible, so in most cases, the solution is trivial. For example, spreading a list into an array

int[] MyFunc(List<int> source) => [..source];

is simply

source.ToArray();

Similarly, spreading an array into another array:

int[] MyFunc(int[] source) => [..source];

uses the ReadOnlySpan<T>.ToArray() trick we saw earlier

new ReadOnlySpan<int>(source).ToArray();

Meanwhile, spreading an IEnumerable<T> into a T[]

int[] MyFunc(IEnumerable<int> source) => [..source];

uses a List<T>, adds the elements with AddRange(), and then calls ToArray() to get the contents as an array:

List<int> list = new List<int>();
list.AddRange(source);
return list.ToArray();

Finally, ICollection<T> and similar interfaces where the collection length is known use a foreach to write the array elements, exactly as shown in the previous section when returning an IEnumerable<T>.

Creating ReadOnlySpan<int> using spread elements

For the most part, creating a ReadOnlySpan<T> from spread collections is identical to the T[] case, so we'll gloss over them a bit.

In the following examples you'll notice there are multiple methods in the sharplab.io source code. That's to stop the compiler eliding the ReadOnlySpan<T> creation entirely!

Going through the simple cases again, creating a ReadOnlySpan<T> from a List<T> creates an array from the List<T> and wraps it directly:

List<int> source = //...
new ReadOnlySpan<int>(source.ToArray())

Similarly, creating a ReadOnlySpan<T> from a T[] creates a copy of the source array by wrapping it in a ReadOnlySpan<T> and then calling ToArray():

int[] source = //...
new ReadOnlySpan<int>(new ReadOnlySpan<int>(source).ToArray())

Using an IEnumerable<T> source uses the List<T> trick to create an array, and then wraps that with ReadOnlySpan<T>

IEnumerable<int> source = //...

List<int> list = new List<int>();
list.AddRange(source);
new ReadOnlySpan<int>(list.ToArray());

and finally, ICollection<T> uses an array with a foreach:

ICollection<int> source = // ...
int num = 0;
int[] array = new int[source.Count];
foreach(var current in source)
{
    array[num] = current;
    num++;
}
new ReadOnlySpan<int>(array);

For the final finale, we'll briefly consider the case where you're creating a ReadOnlySpan<T> from a mixture of fixed elements and a spread array, something like this:

int[] source = //
ReadOnlySpan<int> = [1, ..source, 6, 7];

The compiler still knows the final required length of the ReadOnlySpan<T> so it can preallocate an array of the right size, but the generated code is kind of interesting for showing how Span<T> really makes it easier to copy blocks of data around:

int element0 = 1; // The value to store in element 0.
int index = 0; // Element index
int[] array = new int[3 + source.Length]; // calculate the final array size

//set the first element and increment indexer
array[index] = element0; 
index++;

// Wrap a ReadOnlySpan<T> around the source array
ReadOnlySpan<int> readOnlySpan = new ReadOnlySpan<int>(source);

// 👇 This is the meat of the spread. It
// - Wraps a Span<T> around the destination array
// - Slices the array to the correct size, returning a Span<T> of the correct size
// - Copies the source Span<T> into the destination sliced Span<T>, which 
//   writes the data to the underlying array
// - Increment the indexer by the number of elements written
span.CopyTo(new Span<int>(array).Slice(index, span.Length));
index += span.Length;

// Set the remaining fixed elements
array[index] = 6;
index++;
array[index] = 7;
index++;

// Wrap a ReadOnlySpan<T> around the destination array
new ReadOnlySpan<int>(array);

It's nothing particularly complex as long as you've taken the time to understand Span<T>, but it's nice to think that this is about as efficient as it could be and you didn't have to write it. That won't always be the case, particularly if you're spreading multiple collections, but as new versions of .NET are released, the compiler can continue to improve, and your code just gets faster.

We've covered a lot in this deep dive behind the scenes of collection expressions. In the final post of this series I'll how how you can add support for collection expressions to your own types, even if they don't support collection initializers in general.

Summary

One of the big features of C#12 collection expressions is the spread element, which lets you use all the elements of an existing collection when creating a new one. In this post we looked at examples of the code the compiler generates when you use the spread element. The generated code varies based on both the source collection type and the destination type, but in general the compiler performs the copying as efficiently as it can.


Viewing all articles
Browse latest Browse all 743

Trending Articles