Cross-Site Scripting (XSS) in Plain English

Welcome to my weekly series where I explain different types of website attacks in plain English, steering clear of heavy security jargon commonly found in articles of this nature. Today, I’d like to tackle Cross-Site Scripting, more commonly known by the much scarier acronym XSS.

Modern websites are far more complex than the static pages that used to rule the internet. These days, it is more accurate call them web applications, due to the growing trend of replacing server-side logic with client-side Javascript. While Javascript as a programming language has evolved over the years, the ways that Javascript code is meant to be added to a web page have not. This is why we can still use <script> and </script> tags inside of HTML documents and put any Javascript we want inside of them, and this is the main reason why XSS is still rampant today.

XSS allows malicious users to inject client-side code (mainly Javascript) into web pages to be run by other unsuspecting users. It may be easier to understand with an example. Suppose I’m a web developer creating a hot new search engine: example.com. At its basic level, the search engine requires two pages. The first page, http://www.example.com, only contains a search box.

<form action="/search" method="get">
  <input type="text" name="query" />
</form>

The second page contains the list of search results. As a friendly reminder to the user, it also includes their search term. The server-side code that generates that piece of HTML, here implemented using Sinatra, may look something like this.

require 'sinatra'

get '/search' do
  html = ""
  # ...
  html += "Here are the results found for: #{params[:query]}"
  # ...
  return html
end

The Danger

Using typical string interpolation here presents a problem to the user’s browser because it cannot differentiate between HTML intended by my code and any HTML entities that may exist inside the query parameter. As a result, it is easy for an attacker to exploit this by typing the following into the search box:

<script>alert('hacked!');</script>

Our original intent was to remind the user of what her search term was, so we want everything inside the paragraph tags to be treated as plain text:

...
<p>Here are the results found for: <script>alert('hacked!');</script></p>
...

Unfortunately, the script tags here get parsed just like any other script tag, and the Javascript code between them gets executed. The browser does not know the difference between the script tag inserted via user input and a script tag inserted by us.

...
<p>Here are the results found for: <script>alert('hacked!');</script></p>
...

At this point you might be thinking, “So what? Javascript is client-side, so the attacker only managed to accomplish hacking himself.” Unfortunately, this is not the whole story. At this point the attacker’s URL bar reads http://www.example.com?query=<script>alert('hacked!');</script>, and she could easily copy this URL and paste it somewhere in an effort to get potential victims to click on it. She could post it to public forums, send e-mails to example.com users that include this link (with a tempting title like “Check out these cat pictures!”) or embed this page on her own site using an invisible iframe. In any case, the malicious Javascript code then runs on the unsuspecting victim’s computer. Notice how this differs from another popular attack, SQL injection, in that XSS is aimed at users of the website, not the website itself.

The worst part is that because Javascript is designed to be a powerful tool to manipulate a web page, this kind of attack can be devastating. An attacker can use XSS to steal users’ cookies and use those to impersonate them at example.com, steal their credit card information, or even trick them into installing and downloading malware. Anything that HTML and Javascript can do, the attacker can do.

The Answer

The main defense against XSS is to escape all user input. Escaping user input is the technique of replacing certain characters with other equivalent characters to remove ambiguity for a browser’s parsers. Doing this properly is a solid defense against XSS, because escaped characters signal to a parser that they are to be treated as text and never as code. To do this properly, we have to identify which characters are safe to display without being mistaken for characters can switch out of the current context. Every character not in this safe list needs to be escaped, so that the browser does not treat them as executable code.

Unfortunately, there is no single tool or algorithm to do this, due to the variety of contexts in which one could insert user input, and the different requirements each of those contexts have for properly escaping text. Typically, however, modern web programming frameworks have libraries devoted to escaping user input in a variety of contexts. I recommend strictly using those libraries and not implementing your own. If you’re curious about how these libraries work, in the following sections I discuss the most common contexts in which you would want to insert user input, and the proper ways to use escaping to prevent XSS.

Between Opening and Closing HTML Content Tag

Inside standard content elements is the safest place to insert user input. HTML content elements include tags such as <p>, <div>, and <li>, essentially any element meant to contain other content elements or plain text. In this case, we want to use HTML escaping to ensure user input is never mistaken for an HTML tag or attribute. This means that we have to convert certain dangerous characters into the form &X;, where X is either a number (preceded by a #) or, in certain cases, a name. These constructs are called HTML entities, and they tell the HTML parser that they should be interpreted and displayed as text, and never treated as HTML tags. Below is a complete list of the characters that need to be escaped.

Dangerous Character Named HTML Entity Numerical HTML Entity (in hex)
& &amp; &#38;
< &lt; &#60;
> &gt; &#62;
&quot; &#34;
'   &#39;

 

In our search engine example above, we wanted to place user input inside of <p> tags, even if the input is an attempt at XSS. This can safely be accomplished by using the HTML escaping technique. The raw HTML with proper escaping looks like this:

...
<p>Here are the results found for: &lt;script&gt;alert(&#39;hacked!&#39;);&lt;/script&gt;</p>
...
HTML Attribute Values

While it is possible to allow user input in HTML tag attributes, it is significantly more dangerous than allowing user input between content tags. Because HTML attribute values don’t have to be quoted, there are many more ways for attackers to escape out of them and inject malicious code. In the following contrived example, we construct a page uses a get parameter to set the width of an image.

require 'sinatra'
get '/image' do
  html = ""
  # ...
  html += "<img src=image.jpg height=300 width=#{params[:w]}>"
  # ...
  return html
end

Here, if an attacker constructs the URL http://example.com/image?w=400%20onload=alert('hacked!'), the resulting HTML will cause the malicious Javascript to run with the image is loaded.

...
<img src=image.jpg height=300 width=400 onload=alert('hacked!')>
...

To ensure safety, we have to escape all non-alphanumeric characters in the user input using HTML entities, not just the five characters listed in the previous table. A complete list HTML entities can be found here. In the above example, properly escaped user input would look like this:

...
<img src=image.jpg height=300 width=400&#32;onload&#61;alert&#40;&#39;hacked&#33;&#39;&#41;>
...
JSON String Values

If you want to allow user input to be embedded in your JavaScript code, the only safe place is inside of a quoted string, either as a regular string variable or within a JSON string value. Even here, it is still dangerous to allow user input to be inserted unescaped, as the example below illustrates.

<script>
var string = "</script><script>alert('hacked!');"
</script>

Even though the red </script> is inside of a Javascript string, it closes the Javascript context and starts a new one. This is because browsers have their HTML parsers run before the Javascript parsers, so HTML elements get highest priority. Even my text editor gets this wrong.

The best solution here is to escape every non-alphanumeric character using unicode escaping. The following table has some examples.

Dangerous Character Unicode escape
< \u003C
> \u003E
" \u0022

 

There are other dangerous places to allow user input to be inserted, such as CSS property values and URL get parameters, but the solutions for all of them are the same: always escape user input in every context. Rather than trying to remember all of the escaping rules for each context, it’s much safer to use a library for the job. Read the documentation of your favorite web framework and use its built-in tools to ensure you don’t make any mistakes.

As you’ve seen in the examples above, it is all too easy to expose your site to XSS, and these types of vulnerabilities can be incredibly hard to detect for even trained human eyes. As an added level of security, I highly recommend using an automated tool to scan for and detect XSS vulnerabilities in your site. Tinfoil provides the best web application security solution on the market, and it detects XSS vulnerabilities on your website along with many other types of web vulnerabilities.


Angel Irizarry

Angel Irizarry is the Software Samurai of Tinfoil Security, and a self-proclaimed software purist. All he needs to do his best work is a plain Linux machine with Git and Emacs installed. He loves everything about front-end development, like making pages interactive and super fast, even if that means digging in and optimizing some SQL. When he's not writing code, which isn't very often, you'll find him on his iPad scouring his RSS feeds for news and rumors of cool new gadgets.

Tags: XSS plain english

Tinfoil Security Blog

Tinfoil Security provides the simplest security solution. With Tinfoil Security, your site is routinely monitored and checked for vulnerabilities using a scanner that's constantly updated. Using the same techniques as malicious hackers, we systematically test all the access points, instantly notifying you when there's a threat and giving you step-by-step instructions, tailored to your software stack, to eliminate it. You have a lot to manage; let us manage your website's security.