Quantcast
Channel: Andrew Lock | .NET Escapades
Viewing all articles
Browse latest Browse all 743

A brief look at StringValues

$
0
0

In this post I take a brief look at one of the core types of ASP.NET Core, the humble StringValues. I look at where StringValues is used in the framework, what it's used for, how it's implemented, and why.

Duplicate HTTP headers

If you're an ASP.NET Core developer, you may have come across StringValues in various places, especially working with HTTP Headers.

One of the features of HTTP is that you can include the same header key multiple times for certain headers (from the spec):

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma.

I'm not going to worry about whether this is something you should do; the fact is you can, so ASP.NET Core has to support it. Essentially it means that for every header name in a request (or response), you could have 0, 1, or many string values:

GET / HTTP/1.1
Host: localhost:5000    # No MyHeader

GET / HTTP/1.1
Host: localhost:5000
MyHeader: some-value    # Single header

GET / HTTP/1.1
Host: localhost:5000
MyHeader: some-value    # Multiple headers
MyHeader: other-value   # Multiple headers

So let's say you're on the ASP.NET Core team, and you need to create a "header collection" type. How do you handle it?

A naïve implementation using arrays

One obvious solution would be to always store all the values for a given header as an array. An array can easily handle zero ([]), one (["val1"]), or more (["val1", "val2"]) values without any complexity. A pseudo implementation would effectively be:

public class Headers : Dictionary<string, string[]> { }

If you want the values for a given header, say MyHeader, then you can get the values something like this:

Headers headers = new(); // populated automatically from HTTP request
string[] values = headers["MyHeader"] ?? [];

So the good part of this API is that it doesn't hide the fact that there are multiple values for the same header.

Unfortunately, there are several downsides to this naïve approach:

  • In the vast majority of cases, there will be a single header value, but you always have to handle cases when there technically may be many values, even if those cases aren't actually valid.
  • Storing a single value in an array increases the allocations, so hurts performance.

System.Web back in the old ASP.NET days solved this by using a NameValueCollection for HttpRequest.Headers. This old type looks a little bit like a Dictionary<string, string> from its public API, but it actually stores values in an array and then automatically combines them on the way out:

using System.Collections.Specialized;

var nvc = new NameValueCollection();

nvc.Add("Accept", "val1");
nvc.Add("Accept", "val2");

var header = nvc["Accept"];
Console.WriteLine(header); // prints "val1,val2"

Combining headers this way using ',' is technically the "correct" way to combine the headers, according to the HTTP specification.

The nice thing about this API from a consumer point of view, is that you don't have to worry about whether there were multiple headers or not, as they're automatically joined together for you, and you always get a single string. You can also use GetValues() to get the values as a string[]

Unfortunately, there are still several downsides to this approach:

  • The values are still stored as a string[] (actually as an ArrayList), so you're still "paying" for the allocations even when there's only a single value.
  • When you retrieve the values using GetValues(), another string[] is allocated.

Finally, with NameValueCollection there's no way to know how many values are contained for a given header before you extract it. So you either play it "safe" and use GetValues(), guarnateeing an extra string[] allocation even if it's not necessary, or you use the indexer (or Get()), and risk multiple values being joined together into a single string.

All these extra allocations are anathema to framework teams, which is where StringValues comes in.

The solution: StringValues

What we would really like is:

  • To store (and extract) a string when there's only one value, so we don't allocate an extra unneccessary array.
  • To store (and extract) a string[] when there's more than one value.
  • No extra allocation on storage or retrieval (if possible).

The solution to that problem in ASP.NET Core, and the focus of this post, is StringValues.

StringValues is a readonly struct type which, as it says in the source code:

Represents zero/null, one, or many strings in an efficient way.

StringValues achieves this efficiency goal by storing a single object? field which can take one of 3 values:

  • null (representing 0 items)
  • string (i.e. 1 item)
  • string[] (any number of items)

In some earlier implementations, StringValues stored the string and string[] values as separate fields, but they were merged into a single object field in this PR as it makes the whole struct single-pointer sized, which gives various performance gains as described in this issue.

From a user API point of view, StringValues acts somewhere between a string and a string[]. It has methods such as IsNullOrEmpty(), but it also implements a raft of collection-based interfaces and related methods:

public readonly struct StringValues : IList<string?>, IReadOnlyList<string?>, IEquatable<StringValues>, IEquatable<string?>, IEquatable<string?[]?>
{
}

You can create a StringValues object using one of the constructors:

public readonly struct StringValues
{
    private readonly object? _values;
    public StringValues(string? value)
    {
        _values = value;
    }

    public StringValues(string?[]? values)
    {
        _values = values;
    }
}

Being a readonly struct, StringValues doesn't require any additional allocations on the heap in addition to the string or string[] it contains.

As always, whether or not a readonly struct will be allocated on the heap depends on exactly how it's used. The .NET team are careful to avoid boxing, but you have to be careful too!

Depending on how you need to use the StringValues, you have various options available for extracting the values from the StringValues instance. For example, if you need the value as a String you could do something like this:

StringValues value;
if (value.Count == 1)
{
    // only one value, so can implicitly cast directly to string (or call ToString())
    string extracted = value;
}

// Alternatively, you can automatically concatenate all the array values similar to NameValueCollection by calling `ToString()`

Alternatively, if you expect multiple values, or generally want to enumerate all the values to be safe you can simply use a foreach loop:

StringValues value;
foreach (string str in value)
{
    // may be one or my string
}

StringValues uses a custom struct Enumerator that just returns the _values field if it contains a single string, and otherwise enumerates the string?[] values.

You can also call ToArray(), but this will allocate if you only have a single string value, so should be avoided if possible.

There's not much more to worry about with StringValues, but some of the implementation details are kind of interesting, so I'll take a look at some of those in the next section

Behind some implementations of String Values

The implementation of IsNullOrEmpty epitomises the general patterns used inside StringValues: pattern matching to check for null and either string or string[], and then using Unsafe.As<> to cast to the other Type once we know for sure what _values contains.

public static bool IsNullOrEmpty(StringValues value)
{
    // Take local copy of _values so type checks remain valid even if the StringValues is overwritten in memory
    object? data = value._values;
    if (data is null)
    {
        return true;
    }
    if (data is string[] values)
    {
        return values.Length switch
        {
            0 => true,
            1 => string.IsNullOrEmpty(values[0]),
            _ => false,
        };
    }
    else
    {
        // Not array, can only be string
        return string.IsNullOrEmpty(Unsafe.As<string>(data));
    }
}

You can see a similar pattern in the implementation of Count:

 public int Count
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    get
    {
        if (value is null)
        {
            return 0;
        }
        if (value is string)
        {
            return 1;
        }
        else
        {
            // Not string, not null, can only be string[]
            return Unsafe.As<string?[]>(value).Length;
        }
    }
}

The final method we'll look at is GetStringValue(). This is a private method called by ToString() (among others) which converts the value to a string, no matter what the stored value is. The string and null cases are trivial, but even the string[] version shows a good example of performance related C#, using string.Create()

I've simplified this code slightly to only show the .NET Core version. There's also a .NET Framework/.NET Standard version that uses a custom StringBuilder which I've ommitted for the sake of brevity.

private string? GetStringValue()
{
    // Take local copy of _values so type checks remain valid even if the StringValues is overwritten in memory
    object? value = _values;
    if (value is string s)
    {
        return s;
    }
    else
    {
        return GetStringValueFromArray(value);
    }

    static string? GetStringValueFromArray(object? value)
    {
        if (value is null)
        {
            return null;
        }

        // value is not null or string, so can only be string[]
        string?[] values = Unsafe.As<string?[]>(value);
        return values.Length switch
        {
            0 => null,
            1 => values[0],
            _ => GetJoinedStringValueFromArray(values),
        };
    }

    static string GetJoinedStringValueFromArray(string?[] values)
    {
        // Calculate final length of the string
        int length = 0;
        for (int i = 0; i < values.Length; i++)
        {
            string? value = values[i];
            // Skip null and empty values
            // I'm not sure why !string.IsNullOrEmpty() isn't used, but seeing
            // as Ben Adams wrote it, I'm sure there's a good reason 😅
            if (value != null && value.Length > 0)
            {
                if (length > 0)
                {
                    // Add separator
                    length++;
                }

                length += value.Length;
            }
        }

        // Create the new string
        return string.Create(length, values, (span, strings) => {
            int offset = 0;
            // Skip null and empty values
            for (int i = 0; i < strings.Length; i++)
            {
                string? value = strings[i];
                if (value != null && value.Length > 0)
                {
                    if (offset > 0)
                    {
                        // Add separator
                        span[offset] = ',';
                        offset++;
                    }

                    value.AsSpan().CopyTo(span.Slice(offset));
                    offset += value.Length;
                }
            }
        });
    }
}

And that's it for this short post. StringValues is a good example of how ASP.NET Core is carefully optimised for performance but without sacrificing ease of use in the API. You can work with StringValues almost as easily as you would with a string or a string[].

Summary

In this post I looked briefly at the common HTTP problem of handling headers that appear more than once. I discussed how this was solved imperfectly in ASP.NET using the NameValueCollection type, and how ASP.NET Core handles it more gracefully with StringValues. Finally, I showed how StringValues is implemented to reduce allocations compared to the naïve string[] approach by using a single field to hold either a string or string[] object, and implementing various collection interfaces.


Viewing all articles
Browse latest Browse all 743

Trending Articles