Quantcast
Channel: Andrew Lock | .NET Escapades
Viewing all articles
Browse latest Browse all 743

Reading JSON and binary data from multipart/form-data sections in ASP.NET Core

$
0
0

In this post I describe how to read both JSON and binary data from a multipart/form-data request in ASP.NET Core. A colleague at work needed this functionality, and we couldn't find a way to do it using the "normal" mechanisms in ASP.NET Core. This post describes the approach we ended up with.

What is multipart/form-data?

Before we get to far into the weeds, we should establish what multipart/form-data looks like, and how it differs from other request types. In the modern web, there are two main approaches to sending traditional HTTP requests

  • JSON data
  • Form data

Yes, I know, there's plenty of other formats, XML, binary data over websockets, gRPC etc. But I would wager that most people are using JSON or form data in their day-to-day apps.

JSON data is typically used when sending data programmatically to APIs. That could be a JavaScript or Blazor client sending requests to a backend app, or it could be a server-side app making HttpClient requests to another API.

On the other hand, form data is the common data format for "traditional" server-rendered applications, like you might build with MVC, Razor Pages, or just plain HTML.

When you send data in an HTTP request, you should specify the type of the data. For JSON requests that's application/json, but for form data there's two possibilities:

  • application/x-www-form-urlencoded
  • multipart/form-data

By default, if you create an HTML <form> element and use the built-in browser capabilities to POST it to the server, the form will be sent as application/x-www-form-urlencoded data. For example, a form that looks something like this:

<form action="/handle-form" method="POST">
  <input type="text" name="Name" />
  <input type="date" name="DueDate" />
  <input type="checkbox" name="IsCompleted" />
  <input type="submit" />
</form>

when submitted, would send an HTTP request something like this:

POST /handle-form HTTP/2
Host: andrewlock.net
accept: text/html,*/*
upgrade-insecure-requests: 1
content-type: application/x-www-form-urlencoded
content-length: 50

Name=Andrew+Lock&DueDate=2024-08-01&IsCompleted=on

Note that the data is included in the body in "URL-encoded" format, so spaces are encoded as +, and the fields are concatenated using &. This is a pretty compact format for sending data, but it has some limitations. One of the most obvious missing features is the ability to submit a file using a form. To submit a file in the body of the request you'll need to switch to the multipart/form-data encoding instead.

You can switch to this format in an HTML form by specifying the enctype attribute, for example:

<!-- Encode the POST using multipart/form-data 👇 -->
<form action="/handle-form" method="POST" enctype="multipart/form-data">
  <input type="text" name="Name" />
  <input type="date" name="DueDate" />
  <input type="checkbox" name="IsCompleted" />
  <input type="file" name="myFile" /> <!-- 👈 Send a file too -->
  <input type="submit" />
</form>

This produces an HTTP request that looks something like this:

POST /handle-form HTTP/2
Host: andrewlock.net
accept: text/html,*/*
upgrade-insecure-requests: 1
content-type: multipart/form-data; boundary=----WebKitFormBoundaryYCxEzUfoh3oKUrnX
content-length: 3107

------WebKitFormBoundary8Vq2TDi66aYi2H5d
Content-Disposition: form-data; name="Name"

Andrew Lock
------WebKitFormBoundary8Vq2TDi66aYi2H5d
Content-Disposition: form-data; name="DueDate"

2024-08-01
------WebKitFormBoundary8Vq2TDi66aYi2H5d
Content-Disposition: form-data; name="IsCompleted"

on
------WebKitFormBoundary8Vq2TDi66aYi2H5d
Content-Disposition: form-data; name="myFile"; filename="some-file-to-send.json"
Content-Type: application/octet-stream

<binary data>

------WebKitFormBoundary8Vq2TDi66aYi2H5d--

This is obviously much more verbose than the simple application/x-www-form-urlencoded request but it's also much more flexible. There's a few important features to notice here:

  • Each "part" is separated by a boundary marker: ------WebKitFormBoundary8Vq2TDi66aYi2H5d. The boundary to be used is declared in the request content-type header.
  • Each "part" includes a Content-Disposition header, which has a type of form-data, and specifies the field name the part defines
  • Each "part" includes an optional Content-Type. If not specified text/plain is assumed, but this can be changed. For the file, you can see the file is submitted as a binary file (application/octet-stream).

For normal HTTP requests coming from HTML form submission, these two formats are pretty much exactly what you will encounter. But if the request is coming from app code, there's nothing stopping you sending other types of data in a multipart/form-data request…

Sending JSON and binary data as multipart/form-data

Let's say you want to send a request to a server. You need to include a whole load of binary data, and also some JSON metadata about the binary. How would you handle it?

If the JSON data was small enough, perhaps you could encode it in the request headers, or in the querystring, and then send the binary in the request body. That approach might work, but there are various potential issues with this scheme. A simple reason not to do this is that headers and querystring values often end up in logs (which could potentially leak sensitive information). Additionally, these would place limits on the allowed size of the JON which could be a problem.

The approach we settled on in this case was to send a multipart/form-data request. This included the JSON data in one part, and the binary data in the other part, something like this:

--73dc24e0-b350-48f8-931e-eab338df00e1
Content-Type: application/json; charset=utf-8
Content-Disposition: form-data; name=myJson

{"Name":"Andrew","Type":"Engineer"}
--73dc24e0-b350-48f8-931e-eab338df00e1
Content-Type: application/octet-stream
Content-Disposition: form-data; name=binary

<binary data>
--73dc24e0-b350-48f8-931e-eab338df00e1--

Note how we have a content-type for both parts: the first part is application/json and the second part is our binary, application/octet-stream.

You can send an HTTP request like this using the following HttpClient code. This is more involved than a typical HttpClient request, because you need to build up the multipart/form-data piece-by-piece, but I've annotated the code to explain the important steps:

var client = new HttpClient();

// This is the JSON data we're going to send
var myData = new MyData("Andrew", "Engineer");
// Create a byte array with random data to send
var myBinaryData = new byte[100];
Random.Shared.NextBytes(myBinaryData);

// Create the "top-level" content. This sets the content-type
// to multipart/form-data and acts as a container
using var content = new MultipartFormDataContent();

// Create the JsonContent part, and add it to the multipart/form-data
// with the name "myJson"
content.Add(JsonContent.Create(myData), "myJson");

// Create the Binary part, and add it to the multipart/form-data
// with the name "binary". You have to explicitly specify a
// Content-Type, otherwise it has the default (text/plain)
var binaryContent = new ByteArrayContent(myBinaryData);
binaryContent.Headers.ContentType = new("application/octet-stream");
content.Add(binaryContent, "binary");

// Log the data we're sending to confirm we have received it correctly
// on the other side!
app.Logger.LogInformation("Sending Json {Data} and binary data {Binary}", myData, Convert.ToBase64String(myBinaryData));

// Send the request
var response = await client.PostAsync("https://localhost:8080/", content);
response.EnsureSuccessStatusCode();

// The JSON record definition
record MyData(string Name, string Type);

When we run this, we log something like the following:

Sending Json MyData { Name = Andrew, Type = Engineer } and binary data qe8gFK3/PdDIZQ3MrpQBB0o9ymSs8Azk6ALo0raP2qn0mB2BKqlB0DkXuJHG79OyvdwabLgMCdr2a8U1txABVo3pxN0ik8oT6P4zIlfgAwDH/ZcV118tqdqITvE8B2NmMdayIA==

and the HTTP request has a body similar to the one shown above. So now we can send data, we need to write the other part, the ASP.NET Core endpoint that receives this data.

Difficulties reading complex, non-file, multipart/form-data in ASP.NET Core

.NET 8 added support for binding forms to minimal API endpoint parameters. For most simple cases, this binding provides an easy way to access form files and form data, such as the multipart/form-data I showed in the first part of this post:

app.MapPost("/handle-form", (
    [FromForm] string name,  // 👈 Bind to the Name part
    [FromForm] DateOnly dueDate, // 👈 Bind to the dueDate part
     IFormFile myFile) // 👈 Bind a Stream to the myFile part
    => Results.Ok(comment));

The [FromForm] parameters bind trivially to the form-data parts like this:

------WebKitFormBoundary8Vq2TDi66aYi2H5d
Content-Disposition: form-data; name="Name"

Andrew Lock

while the IFormFile parameters bind to the file parts that look like this:

------WebKitFormBoundary8Vq2TDi66aYi2H5d
Content-Disposition: form-data; name="myFile"; filename="some-file-to-send.json"
Content-Type: application/octet-stream

<binary data>

The IFormFile parameter provides a Stream for reading the binary file data which is exactly the functionality we need.

Unfortunately, it doesn't work for our request.😢

Our JSON plus binary request doesn't have any simple form values, so we can't bind using [FromForm]. Also, IFormFile won't bind to the binary part of our body; it's always null, whether you try to bind IFormFile or IFormFileCollection 🙁

More on why this was the case later!

OK…there's always the fallback of HttpRequest.GetFormAsync() right? Unfortunately not

Image showing the result of running IFormCollection form = await request.ReadFormAsync() on a request

As you can see in the image above, calling GetFormAsync() returns an IFormCollection which contains the multipart/form-data data. However, this image also shows that each part of the response has been converted to a StringValues object (i.e. a string). There's no way to get the "raw" binary data of the file through this API 🙁

After testing this, we were a bit stuck. There didn't appear to be any way to get hold of the "raw" data from HttpRequest.

Manually reading multipart/form-data with MultipartReader

It was at this point that I decided to go spelunking through the ASP.NET Core source code. ASP.NET Core is clearly able to parse the request and grab a Stream for "files", so maybe there's something there that we can reuse.

Sure enough, after a small amount of searching I found the FormFeature implementation which parses the request body. And it does that using the MultipartReader class which is public!🥳

As you might expect, the MultipartReader facilitates reading the request body. It's a lower-level API than the IFormCollection or IFormFile types that you typically interact with in your endpoints; rather it's typically used to populate those types.

Adding support for our response body was mostly a case of copying the broad approach used in FormFeature, but tailoring it to our use case. The following endpoint shows how we used MultipartReader to parse the JSON and binary response and could deserialize from the request stream directly. There's quite a lot of code here, but I've commented as best I can!

// Create a minimal API endpoint that handles a POST
// We use the default configured JsonOptions for deserializing the JSON
app.MapPost("/", async (HttpContext ctx, IOptions<JsonOptions> jsonOptions) =>
{
    // make sure we have the correct header type
    if (!MediaTypeHeaderValue.TryParse(ctx.Request.ContentType, out MediaTypeHeaderValue? contentType)
        || !contentType.MediaType.Equals("multipart/form-data", StringComparison.OrdinalIgnoreCase))
    {
        return Results.BadRequest("Incorrect mime-type");
    }

    // Variables for holding the data parsed from the response
    MyData? jsonData = null;
    byte[]? binaryData = null;

    // Get the multipart/form-boundary header from the content-type
    // Content-Type: multipart/form-data; boundary="--73dc24e0-b350-48f8-931e-eab338df00e1"
    // The spec says 70 characters is a reasonable limit.
    string boundary = GetBoundary(contentType, lengthLimit: 70);
    var multipartReader = new MultipartReader(boundary, ctx.Request.Body);
    
    // Use the multipart reader to read each of the sections
    while (await multipartReader.ReadNextSectionAsync(ct) is { } section)
    {
        // Make sure we have a content-type for the section
        if(!MediaTypeHeaderValue.TryParse(section.ContentType, out MediaTypeHeaderValue? sectionType))
        {
            return Results.BadRequest("Invalid content type in section " + section.ContentType);
        }

        if (sectionType.MediaType.Equals("application/json", StringComparison.OrdinalIgnoreCase))
        {
            // If the section is JSON, deserialize directly from the section stream
            // using the default JSON serialization options configured for the app
            jsonData = await JsonSerializer.DeserializeAsync<MyData>(
                section.Body,
                jsonOptions.Value.JsonSerializerOptions,
                cancellationToken: ctx.RequestAborted);
        }
        else if (sectionType.MediaType.Equals("application/octet-stream", StringComparison.OrdinalIgnoreCase))
        {
            // If the section is binary data, deserialize into an array
            // there are potentially more efficient things we could do here 
            // depending on how you need the data
            using var ms = new MemoryStream();
            await section.Body.CopyToAsync(ms, ctx.RequestAborted);
            binaryData = ms.ToArray();
        }
        else
        {
            return Results.BadRequest("Invalid content type in section " + section.ContentType);
        }
    }

    // Just printing it out for debugging purposes
    app.Logger.LogInformation("Receive Json {JsonData} and binary data {BinaryData}",
        jsonData, Convert.ToBase64String(binaryData));

    return Results.Ok();
    
    // Retrieves the boundary marker from the content-type, handling quotes etc 
    // Taken from https://github.com/dotnet/aspnetcore/blob/4eef6a1578bb0d8a4469779798fe9390543d15c0/src/Http/Http/src/Features/FormFeature.cs#L318-L320
    static string GetBoundary(MediaTypeHeaderValue contentType, int lengthLimit)
    {
        var boundary = HeaderUtilities.RemoveQuotes(contentType.Boundary);
        if (StringSegment.IsNullOrEmpty(boundary))
        {
            throw new InvalidDataException("Missing content-type boundary.");
        }
        if (boundary.Length > lengthLimit)
        {
            throw new InvalidDataException($"Multipart boundary length limit {lengthLimit} exceeded.");
        }
        return boundary.ToString();
    }
});

Phew. That seems like a lot of code, but it works, and it's the only way I could find to deserialize the payload we are receiving.

info: testApp[0]
      Sending Json MyData { Name = Andrew, Type = Engineer } and binary data JtlpOBcTftaEIUSfXO1X3K5ubjE09ewqsappxBj6ok0n0K8dUIexc7RXLhymJsb9eErSoCnZ7+ZLsooe9cQ5gWAG3wKJFpfIRP7qnUceJpes45hJksBTA91J5bqLJIZfVnWagQ==
info: testApp[0]
      Receive Json MyData { Name = Andrew, Type = Engineer } and binary data JtlpOBcTftaEIUSfXO1X3K5ubjE09ewqsappxBj6ok0n0K8dUIexc7RXLhymJsb9eErSoCnZ7+ZLsooe9cQ5gWAG3wKJFpfIRP7qnUceJpes45hJksBTA91J5bqLJIZfVnWagQ==

But it made me wonder… could we tweak that payload slightly so that we can read the response automatically using ASP.NET Core's built-in features? 🤔

Tweaking the multipart/form-data request to keep ASP.NET Core happy

The key to using more of ASP.NET Core's built-in support for reading the body is to make sure that each multipart/form-data part/section is sent with a Content-Disposition that declares both a name and a filename for the part:

--883a97cb-5025-47c0-8d5f-e238687cfc5e
Content-Type: application/json; charset=utf-8
Content-Disposition: form-data; name=myJson; filename=some_json.json

{"name":"Andrew","type":"Engineer"}
--883a97cb-5025-47c0-8d5f-e238687cfc5e
Content-Type: application/octet-stream
Content-Disposition: form-data; name=binary; filename=my_binary.log

<binary data>
--883a97cb-5025-47c0-8d5f-e238687cfc5e--

If you compare that to the form data I showed previously, you'll notice that we've added a filename to the Content-Disposition in both cases. It's easy to update our HttpClient code to add that filename; we just need to specify the filename when creating the MultipartFormDataContent:

content.Add(
    content: JsonContent.Create(myData),
    name: "myJson",
    filename: "some_json.json"); //👈 Add this parameter

Taking that approach, we can adjust our "sending" code accordingly. The code below is the same as the original, but with the Content-Disposition filename set:

var client = new HttpClient();

var myData = new MyData("Andrew", "Engineer");
var myBinaryData = new byte[100];
Random.Shared.NextBytes(myBinaryData);

using var content = new MultipartFormDataContent();
content.Add(JsonContent.Create(myData), "myJson", "some_json.json"); // 👈 added filename

var binaryContent = new ByteArrayContent(myBinaryData);
binaryContent.Headers.ContentType = new("application/octet-stream");
content.Add(binaryContent, "binary", "my_binary.log"); // 👈 Added filename

app.Logger.LogInformation("Sending Json {Data} and binary data {Binary}", myData, Convert.ToBase64String(myBinaryData));

var response = await client.PostAsync("https://localhost:8080/", content);
response.EnsureSuccessStatusCode();

So how does that make things easier in ASP.NET Core?

The key difference is that the FormFeature checks for the presence of filename in the Content-Disposition header of a section. If ASP.NET Core finds the header, it automatically buffers the file body in memory (or to disk for big files) and constructs an IFormFile instance, available in the HttpRequest.Form.Files property, but also for directly injecting into your endpoints.

That means we can dramatically simplify our minimal API. We no longer need to use the MultipartReader to parse the request ourselves. Instead, we can use IFormFile and grab the data directly from there:

// Directly reference the files in the minimal API endpoint
// The name of the parameters should match the name of the 
// files in the content 
app.MapPost("/", async (IFormFile myJson, IFormFile binary, IOptions<JsonOptions> jsonOptions) =>
{
    // Deserialize JSON data from the 'myJson' file
    var jsonData = await JsonSerializer.DeserializeAsync<MyData>(
        myJson.OpenReadStream(), jsonOptions.Value.JsonSerializerOptions);
    
    // Copy binary data from the 'binary' file
    var binaryData = new byte[binary.Length]; // 👈 We know the length because it's already buffered
    using var ms = new MemoryStream(binaryData);
    await binary.CopyToAsync(ms);

    // For demonstration purposes
    app.Logger.LogInformation("Receive Json {JsonData} and binary data {BinaryData}",
        jsonData, Convert.ToBase64String(binaryData));

    return Results.Ok();
}).DisableAntiforgery(); // .NET 8 automatically adds CSRF protection, disable it for this example.

That's significantly less code to achieve essentially the same thing!🎉 I cheated a little bit, as I didn't check the content-type etc was correct for each file, but we could always add that back if we wanted.

Note that there are some subtle differences between the two approaches. With this approach, ASP.NET Core automatically reads the body, buffers it in memory, and creates IFormFile instances for you to work with. With the original, lower-level, approach using MultipartReader you're working directly with the request stream, so I think you may be able to bypass some of that buffering and object creation, depending on exactly what you're doing with the data afterwards.

You may also be wondering if you can request the MyData object directly in the minimal API endpoint, so that you don't need to do the JSON deserialization yourself, something like this:

// ⚠ This doesn't work
app.MapPost("/", async (MyData myJson, IFormFile binary) => { /* */ });

Unfortunately, that's not possible. The "JSON-in-form" in the body of the request is not standard, and ASP.NET Core doesn't support it. Trying to take the above approach will lead to an error, as ASP.NET Core tries to bind the MyData JSON object to the whole of the request body, as well as reading the body as form data.

If you want to learn the details of how minimal APIs decide what to bind, you can read my series on binding in minimal APIs.

You also can't use the [FromForm] attribute to try to "convince" ASP.NET Core to bind the JSON:

// ⚠ This doesn't work
app.MapPost("/", async ([FromForm] MyData myJson, IFormFile binary) => { /* */ });

If you take this approach, ASP.NET Core tries to bind each individual property of the MyData object to separate sections of the form data, in the format that HTML <form> requests send data. You won't get an error this time, your myJson parameter just won't have any data in it!

So there we have it. Two difference approaches to reading multipart/form-data. Hopefully you never need to use my first approach, and can stick to the standard file approach instead!

Summary

In this post I discussed the multipart/form-data format, how it differs from application/x-www-form-urlencoded form data, and what the requests look like. I showed how HTML <form> elements can use the multipart/form-data type to send files, and then showed how you can use HttpClient to send any data in a multipart/form-data request. In my example I sent both JSON and binary data in the same request.

The difficulty is that ASP.NET Core can't automatically read multipart/form-data requests unless they're sent as files or as "normal" form data. There's no way to automatically read either the JSON or binary data. I showed how you can read this data using the MultipartReader type (which ASP.NET Core uses behind the scenes). This gives you complete control over the reading of each section in the request body.

Finally, I showed how you can tweak the data sent in the request to include a filename in the Content-Disposition. After making this change, ASP.NET Core was able to read the request body using the standard IFormFile mechanism. This simplified the code significantly, though you still had to deserialize the JSON data manually.


Viewing all articles
Browse latest Browse all 743

Trending Articles