XML External Entity Injection

Security is hard to get right. Between Cross-Site Scripting (XSS) and SQL Injection (SQLi) alone, there are more ways to make mistakes than any developer can possibly be expected to keep track of manually -- and those are just the two most well-known types of vulnerabilities. Most developers have never even heard of more obscure attacks, like XML External Entity Injection (XXE), and yet a well-placed attack can be just as devastating as the most egregious XSS injection.

We’ll explain what exactly an XXE attack is later, but first it’s important to have a basic understanding of the anatomy of an XML document. XML, or Extensible Markup Language, is a format used to describe the structure of documents, such as web pages. For example, the following XML document might describe a blog post:

<?xml version="1.0" ?>
<post>
  <title>Smashing the Stack for Fun and Profit</title>
  <author>Aleph One</author>
  <content>
    Over the last few months there has been a large increase of buffer overflow vulnerabilities being both discovered and exploited...
  </content>
</post>

In the above document, there’s a few key pieces of terminology to keep track of. Firstly, a tag is a pair of angle brackets surrounded a name. Both <author> and </title> are examples of tags. More important are the logical components of the document, known as elements. One such element above is <author>Aleph One</author>.

A slightly more complicated document might look like the following:

<?xml version="1.0" ?>
<!DOCTYPE author [
  <!ELEMENT author ANY>
  <!ENTITY author "Shane Wilton">
]>
<author>&author;</author>

In this case we’ve defined an entity: essentially a mapping from some name to a value. When this XML document is processed, any instances of “&author;” are going to be expanded to “Shane Wilton”. This is known as internal entity processing, and it is typically used to allow for the modular design of XML documents.

An XXE attack works by taking advantage of a little-known feature of XML -- external entities. The concept is the same as in internal entity processing, but the attack vector lies in being able to use external resources as the replacement text. For example, consider the following document:

<?xml version="1.0" ?>
<!DOCTYPE passwd [
  <!ELEMENT passwd ANY>
  <!ENTITY passwd SYSTEM "file:///etc/passwd">
]>
<passwd>&passwd;</passwd>

When the above document is parsed, the “passwd” element is going to be expanded to contain the contents of “/etc/passwd”.

If a web application accepts user-created XML documents as input, or input which is otherwise used in the creation of XML documents, an attacker is able to use XML entity expansion to load files or other URI-referenceable resources into the web application. If this information is then displayed back to the attacker at a later point, then they’ll find themselves able to exfiltrate possibly privileged information.

Furthermore, by loading a stream of infinite data, like /dev/urandom, an attacker is able to consume all of a system’s resources, denying access to other users.

In some rare cases, it may be possible to gain remote code execution by loading executable code (Such as PHP), or by using the XXE attack as a beachhead to access other, more insecure, internal services. This was exactly the case last year, when a Brazilian engineer used an XXE attack to gain remote code execution against Facebook, earning their largest bug bounty payout to date. His impressive write-up can be read here.

XML External Entity Processing is by no means a complicated bug, but it is difficult to test for. There’s so many variables involved in launching a successful attack, that software engineers simply don’t have the time to invest in performing a full audit of their XML parsing capabilities, if they’re even aware of the possibility of XXE in the first place. That’s why we’re proud to announce that Tinfoil Security now supports automated scanning for XXE attacks, and for the next month, we'll also be scanning all of our free members, at no charge.

Sign up today, so your engineers can spend their time building your product, and we can spend our time worrying about the minutiæ of XML parsing.


Shane Wilton

Shane Wilton is the Grand Magistrate of Security at Tinfoil Security, and the company's resident programming language theorist. When he isn't coding in a functional language like Elixir, he's probably hacking on an interpreter for an esolang of his own, or playing around with dependent types in Idris. Security is always at the forefront of his thoughts, and he enjoys building tools which make it easy for other engineers to write secure code. His love for security is matched only by his love for bad movies - and does he ever love bad movies.

Tinfoil Security Blog

Tinfoil Security provides the simplest security solution. With Tinfoil Security, your site is routinely monitored and checked for vulnerabilities using a scanner that's constantly updated. Using the same techniques as malicious hackers, we systematically test all the access points, instantly notifying you when there's a threat and giving you step-by-step instructions, tailored to your software stack, to eliminate it. You have a lot to manage; let us manage your website's security.