Quantcast
Channel: Andrew Lock | .NET Escapades
Viewing all articles
Browse latest Browse all 743

Avoiding CDN supply-chain attacks with Subresource Integrity (SRI)

$
0
0

There's been a lot of tech on the news recently. There was the xz-utils backdoor in March, the Crowdstrike outage in July, and in the middle there was the Pollyfill.io supply chain attach. It's the Polyfill.io attack that I'm going to talk about in this post. I'll give a brief explanation of what happened, why it was bad, and what you can do to protect yourself from similar attacks in the future.

What was Pollyfill.io?

Polyfill.io was a legitimate open-source project. In its own words:

Polyfill.io is a service which makes web development less frustrating by selectively polyfill just what the browser needs. Polyfill.io reads the User-Agent header of each request and returns polyfill that are suitable for the requesting browser.

In other words, old browsers (such as Internet Explorer) have far fewer features and APIs available than modern browsers. However, some of these APIs can be emulated (polyfilled) by implementing equivalent functionality in JavaScript, which means you can still code against modern browser APIs. On modern browsers, where the APIs already exist, your app uses the APIs directly, while on older browsers you can rely on the polyfilled versions.

This technique was practically a requirement for many years when we had a greater diversity of browser engines, but also questionable adherence to specifications. It's one of the reasons jQuery became so popular!

The need for general-purpose polyfill libraries has reduced significantly these days, with most browsers adopting an "evergreen" approach (i.e. automatically updating) and IE usage continually dropping.

Polyfill.io operated a Content Delivery Network (CDN) which hosted the polyfill.io code, so you could simply add a <script> tag in your application, and your app would automatically pull the JavaScript and run it. The "neat" part of the pollyfill.io service was that it sniffed the User-Agent of the request and returned a different script depending on the requesting browser. As we'll see later, that functionality actually made it very hard to use safely.

What happened?

A lot has been written about the polyfill supply-chain attack, so I'll just give a quick overview here.

  • The polyfill.io GitHub repository and polyfill.io domain were sold to the Chinese company Funnull.
  • The code hosted on the pollyfill.io CDN was changed to inject malicious behaviour.
  • When executed, the pollyfill.io JavaScript redirected users to gambling and adult websites.
  • The domain was suspended 2 days later, mitigating the issue.
  • However, nearly 400,000 sites were found to still be linking to the malicious domain, including big names such as

So yeah, pretty bad.

The thing I found interesting reading about the attack was that this has been a know attack vector for a long time, and we've had protections against this available in browsers for almost 10 years.

Protecting against CDN supply-chain attacks with Subresource Integrity

The main protection for this sort of attack is called Subresource Integrity (SRI). This was proposed (and implemented) way back in 2015, with the specific intention of protecting against malicious CDNs serving files that are different from the files you expect. Technically it doesn't protect specifically against malicious changes to files, it protects against any changes to files.

How it works is pretty simple:

  • In your <script> or <style> tags you add an integrity attribute, which contains a hash of the content you expect, encoded using bae64.
  • When the browser receives the file, it calculates a hash of the contents of the received file.
    • If the hash of the file matches the value in the integrity attribute, it's loaded as normal.
    • If the files are different the browser rejects the file, returns an error and does not load the content.

It's basically as simple as that. You can choose between multiple hash algorithms (sha256, sha384, and sha512), and you can list multiple hashes if you expect several different files.

The result is something that looks like the following, which loads jquery.validation.min.js from the Cloudflare CDN:

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-validate/1.17.0/jquery.validate.min.js"
    crossorigin="anonymous"
    integrity="sha384-rZfj/ogBloos6wzLGpPkkOr/gpkBNLZ6b6yLy4o+ok+t/SAKlL5mvXLr0OXNi1Hp">
</script>

Note that the integrity attribute includes the hash algorithm as a prefix, sha384 in this case. The hash value is everything from rZfj/ogB onwards.

The crossorigin="anonymous" attribute is required because the request is a cross-origin request (a request to a different origin). This enables Cross Origin Resource Sharing (CORS). Without this attribute, the cross-origin request would always fail.

If you make a request and the CDN returns a file that does not match the value in the integrity attribute, you'll get an error something like this:

Failed to find a valid digest in the 'integrity' attribute for resource 'https://cdnjs.cloudflare.com/ajax/libs/jquery-validate/1.17.0/jquery.validate.min.js' with computed SHA-384 integrity 'rZfj/ogBloos6wzLGpPkkOr/gpkBNLZ6b6yLy4o+ok+t/SAKlL5mvXLr0OXNi1Hp'. The resource has been blocked.

So if you're serving files from a CDN you should really be pinning to a specific version, and adding the integrity attribute to your <script> and <style> tags. If you do, then you know that a malicious CDN won't be able to run malicious JavaScript on your page. Of course, this would likely break your website, but that's better than malicious JavaScript running on your site.

If you're running ASP.NET Core and using Razor files, you can use the asp-fallback-src and asp-fallback-test Tag Helpers to provide fallback locations to load from, so that your site doesn't break, as I describe in a previous post.

So given SRI has been in Firefox and Chrome since 2015 (and Edge and Safari since 2018), could the pollyfill.io attack have been largely avoided?

Would SRI have avoided the pollyfill.io attack?

In theory the answer should be obvious—a malicious CDN file is what SRI was designed to protect against. So could SRI have prevented the pollyfill.io attack? Unfortunately, it's somewhat hard to tell.

Fundamentally, pollyfill.io acted more as a service than as a simple CDN. CDNs typically serve strictly versioned files, such as 1.17.0 in the previous example. In contrast, pollyfill.io used major-only versions, for example /v3/polyfill.min.js. That meant that the pollyfill.io service could update the served file at any time, fixing bugs or adding features etc. On the face of it, that might seem like a good thing; who doesn't want bug fixes automatically right?

The problem is that if you were including a dependency on pollyfill.io, you had no idea what code it was serving. You couldn't test updates, to make sure newer versions of the polyfill.min.js didn't break your site. And you ultimately had no control or visibility of what the polyfill.io site was returning.

What's more, by design, the pollyfill.io service returned different values depending on the browser that was making the request. If you're running Firefox 1.0, Chrome 45, or Safari 18, you'll get completely different responses from the endpoint. That seems quite smart on the face of it; it keeps the response file as small by ensuring it only returns code required by the browser. For modern browsers, the service likely returned essentially nothing.

A rough diagram of how the pollyfill.io service used to work

Unfortunately, again, this design makes it difficult to understand exactly what your site expects. Fundamentally, every browser expects to receive different JavaScript. That's a problem for securing your software supply chain. But could have SRI worked here?

I suspect the answer unfortunately, is: probably not, or at least, not while continuing to support very old browsers.

As already described, the SRI integrity attribute allows you to provide multiple valid hashes for a script. So it might be possible to enumerate all the browser versions that you support, and add all the hashes to your site. But given the diversity of browsers out there that you might want to support that seems like it would be problematic.

Additionally, the fact that the pollyfill.io service might suddenly change (due to the major-only references) is a fundamental problem for SRI. If you add integrity attributes, and pollyfill.min.js suddenly changes, then your site is (potentially) going to immediately break, as the file is rejected.

This will likely vary depending on the site, and the browser; if the pollyfill.min.js was essentially a no-op because it was called with a modern browser, then potentially there would be no adverse impact on the site. On the other hand, old browsers are likely going to break.

So it seems like it really wasn't practical to use SRI for the pollyfill.io service due to its design. Unfortunately, that means that the pollyfill.io was inherently an Cross Site Scripting (XSS) attack waiting to happen, and we know how that turned out 😬 That's part of the reason Cloudflare started running their own instance of pollyfill.io at https://cdnjs.cloudflare.com/polyfill/ back in February!

Are CDNs worth it?

I've seen some discourse online decrying the use of CDNs in their entirety. The argument is you just shouldn't host files that are critical to your app on "someone else's server". That's fair enough, but it's ignoring the fact there's a reason CDNs exist; it's not just developers being reckless. But technology moves on, so it's worth examining the reasons that we used CDNs in the first place, and whether those reasons are still applicable.

Historically, there were a variety of reasons to use a CDN:

  1. Reduced latency. CDNs are typically globally distributed, so can give very low latencies for downloading files, wherever in the world your users are. That can make a big difference if your application is only hosted in one region, and users are sending requests from the other side of the world!
  2. Reduce bandwidth. The CDN offloads network traffic from your servers, reducing the load on your server, and reducing your outbound traffic, which may also have monetary benefits when hosting in the cloud.
  3. Shared loading. Other applications may have already downloaded common libraries from the CDN. If the file is already cached by the browser, it may not need to make a request at all, significantly speeding up your application.
  4. Fewer connections. By sending requests for client-side assets to a CDN, you may see higher overall network throughput for your application. Browsers limit the number of simultaneous connections they make to a server (commonly 6). If you host your files on a CDN, the connections to the CDN don't count towards your server limit, leaving more connections to download in parallel from your app.

So the question is, do these still apply?

I believe 1. and 2. are still clearly applicable. It's still common for CDNs to be more widely distributed than your core app, so you will likely benefit from improved latencies for delivering these files to your users. Similarly, any file that a CDN serves is one less hit on your server. And often more importantly, you're not paying egress bandwidth costs from your server.

Reason 3 is no longer a benefit. The original idea was that if everyone loads jQuery from the official CDN, then browsers will likely have the library cached when users hit your site. However, back in 2020, Google Chrome started partitioning caches by taking the requesting domain into account for privacy reasons. Safari made a similar change back in 2013, and Firefox in 2021.

With cache partitioning, reason 3 no longer applies at all. Even if site A downloads jQuery from the CDN and caches it, site B can't use that cached data and must re-download it.

Reason 4 still technically applies, but with HTTP/2, it has become much less relevant. With HTTP/2, multiple requests can be made in parallel, multiplexed over a single connection. This reduces the benefit of CDNs as a way of increasing the number of parallel requests.

The main downsides with CDNs (which remain unchanged) are:

  • You need to trust the CDN to deliver the files you request. You can (and obviously should) enforce this with a good Content Security Policy (CSP) and with SRI integrity attributes.
  • If you don't want your site to break if/when a CDN is unavailable or is compromised, then you need to provide alternative hosting for the files (on your server for example), and add fallback code to detect this situation.

So, with that all in mind, are CDNs worth it? The answer is almost certainly "it depends". Some of the benefits still remain, but there's much less of a value proposition than there used to be, so bear that in mind when making your decision.

Summary

In this post, I discussed the recent pollyfill.io supply-chain attack that resulted in malicious JavaScript running on vulnerable sites. I then described how you could protect against similar attacks using the Subresource Integrity (SRI) feature by adding the integrity attribute, which contains a hash of the expected file data. Finally I examined whether SRI would have worked in the pollyfill.io case, and whether CDNs are still worth using these days.


Viewing all articles
Browse latest Browse all 743

Trending Articles