Exploits: SQL Injection

Created by: Peter A. H. Peterson and Dr. Peter Reiher, UCLA {pahp, reiher}@ucla.edu
Contents
  1. Overview
  2. The Structured Query Language
    1. Introduction to SQL
    2. SQL Examples
    3. SQL Injection
    4. Input Validation and Sanitization
    5. The Old Way -- Escaping Input for SQL
    6. Prepared Statements
    7. Additional Reading on SQL Injection
  3. Software Tools
    1. diff and patch: see differences and create source patches
    2. mysql: command line mysql client
  4. Assignment Instructions
    1. Setup
    2. Tasks
      1. FrobozzCo Community Credit Union
  5. Submission Instructions
    1. submit.sh and restore.sh

Overview

The purpose of this exercise is to introduce you to SQL Injection attacks and give you a first-hand opportunity to see them in source code, exploit them, and patch them. After successfully completing this exercise, you will be able to:

  1. Accurately identify and describe SQL Injection attacks
  2. Identify SQL Injection vulnerabilities in a preexisting PHP/MySQL application
  3. Understand how vulnerabilities can lead to unauthorized access to private data
  4. Repair simple examples of these security flaws
  5. Author memos describing in detail your findings and code changes

You should be familiar with the Unix command line, POSIX permissions, and basic programming. The exercise will use PHP and SQL, but at introductory levels.

The Structured Query Language

Introduction to SQL

SQL -- or the Structured Query Language -- "is a computer language designed for the retrieval and management of data in relational database management systems, database schema creation and modification, and database object access control management"[2]. Put simply, it's a database query language. SQL (pronounced "sequel" or S-Q-L) has been around since the 1970s, and was standardized by ANSI in 1986. SQL is ubiquitous -- practically all current relational databases in use today speak some vendor-specific variant of SQL, and most enterprise web applications use a relational database on the back end. Furthermore, scripting languages like Python, Perl, Ruby, and PHP that are often used for web development and all have robust, easy to use SQL modules. In fact, the combination of Linux servers, the Apache http daemon, the MySQL relational database, and the PHP scripting language are so popular for web development today that it has its own acronym -- LAMP.

Dedicated database servers are valuable because they free programmers from the task of creating customized data storage systems from scratch (which are likely to have bugs and shortcomings). SQL serves as a common language for many database systems, allowing programmers to ignore most details of the specific database system in use (e.g., MySQL, Oracle, PostgreSQL, etc.). Overall, this modularity improves performance and makes maintenance and portability easier.

Additionally, general-purpose database systems like MySQL are a win because the performance critical code is written in a fast language like C or C++, while applications using the database can be created with a slower, user-friendly scripting language. This is especially important when databases contain millions of entries because a database written in a relatively slow scripting language would be prohibitively expensive from a computational and time perspective. SQL servers can run on the same machine that serves the application, or they can be accessed via the network, allowing a web frontend and database backend to operate on separate machines. This flexibility can enable more efficient use of computing resources and can add other other benefits, such as easier backups.

SQL Examples

For example, an online retailer might have a database table for all their inventory with the columns product number, name, price, and number in stock. The programmer can craft a database query requesting all products whose names' second letter is an "x" like so:

SELECT * FROM inventory WHERE name LIKE '_x%';

In SQL, an underscore ( _ ) matches any single character, while a percent ( % ) matches any sequence of characters (like * in DOS and UNIX)

We could also perform a query filtering out the 10 items with the cheapest price:

SELECT * FROM inventory ORDER BY price ASC LIMIT 10;

In SQL, "ORDER BY foo ASC" sorts by the values in the column 'foo', ascending (least to greatest). Using DESC sorts from greatest to least. "LIMIT N" limits the result to N items (after the ordering clause).

We could also ask for just the names of items whose price is greater than 100:

SELECT name FROM inventory WHERE price > 100;

SQL results are typically returned to the programming language as a list (array) or similar data structure. The programmer can then work with the list in her favorite language, make further queries, or display the data in some way. In executing queries like this, the SQL database -- not the application -- performs the selection, filtering, and alphabetization. Not having to write new code to do this (and more) saves application developers a tremendous amount of time.

SQL Injection

Because small flaws can often be leveraged into much larger exploits, simple programming mistakes and omissions often result in unexpected negative security effects. For example, PHP is every bit as vulnerable to failures of input validation such as filesystem and directory traversal exploits as are Perl and Python because these vulnerabilities are a result of system semantics and the necessity of programs to be capable of performing general purpose tasks like reading files.

Poorly written applications that interface with SQL are no different. A common class of attacks are called SQL injection attacks which -- like directory traversal and buffer overflow vulnerabilities -- are the result of not properly validating input and implicitly granting applications privilege they do not require. In this case, the non-validated input actually contains SQL statements and relies on the application to naively insert the user input into the application's own request.

For example, imagine a fatally flawed web interface for the Social Security Administration where you enter your Social Security number (123006789) and the system displays a summary of your account and personal information. Using MySQL and PHP, the application might include code like this:

$ssn = $_POST['ssn'];                                      // get ssn from web form POST data
$query = "SELECT * FROM personal WHERE ssn LIKE '$ssn'";   // construct query (notice embedded $ssn variable)
$result = mysqli_query($query);                             // execute query
echo "$result\n";                                          // output result

If the value for $ssn contained 123006789, the application would construct the following SQL query and execute it:

SELECT * FROM personal WHERE ssn LIKE '123006789'

The result would be your personal data.

Now imagine that instead of entering your Social Security number, you enter % -- the SQL wildcard matching anything. The application foolishly takes your input as valid and blithely plugs it in to the SQL statement:

SELECT * FROM personal WHERE ssn LIKE '%'

This would return the personal information of everyone in the table with a Social Security number!

Even worse, if the user entered the following input into the SSN field

'; DROP TABLE personal

... the constructed query would be:

SELECT * FROM personal WHERE ssn LIKE ''; DROP TABLE personal

The semicolon ( ; ) character separates individual SQL statements and "DROP TABLE tablename" is how database tables are deleted. Thus -- if not for a restriction in PHP applications -- this could result in the deletion the entire personal table from the database! (XKCD had a funny comic about this very issue.)

Input Validation and Sanitization

As we can see, this is a malicious SQL statement and a dangerously negligent application. Using the LIKE operator is a subtle example of unnecessary privilege. Because everyone has a unique Social Security number, there is no chance that a glob match (enabled by LIKE in SQL) will be required. At the very least, the above code should have used the exact match test = (equals sign) instead of LIKE, which would have returned an error when it was given a wildcard to match. The database also has a permissions problem, because the application accessing the database does not need to be able to use the DROP TABLE command. (Database users have permissions similar to users on a filesystem -- but the permissions still need to be configured properly.)

More fundamentally, this application is fatally flawed due to the total lack of input validation. Since all Social Security numbers are 9 digits, there is no chance that a legitimate user would ever need to enter anything but 9 digits into the SSN field. It is trivial to first check the contents of a variable before using it -- making sure that it contains only numbers or letters, etc. -- this is called input validation. In this way, punctuation, letters, and everything but numbers would cause an error before the SQL statement was even assembled.

However, making sure that the SSN contains just numbers or letters only works if you have purely numerical or alphabetical data. Unfortunately, it's not always possible to use this approach, because many fields require letters, letters and numbers, or even arbitrary characters that have special meaning in SQL like the single quote ( ' ) and semicolon ( ; ) that we used for our DROP TABLES statement. For example, consider a web forum where users' posts are stored in a database. Since posts can contain virtually any input, the database must be able to safely handle arbitrary input. Instead of validating our input, we want a way to sanitize it -- to make it safe.

The Old Way -- Escaping Input for SQL

Note: While manual validation and escaping of input is the classic approach to this problem, it's no longer considered a best practice. It's better to use prepared statements (see the next section). It is possible to bypass basic string escaping by using an input that can be dangerous and does not require escaping to be valid, such as 1 OR 1=1, or fiddling with character encoding. Prepared statements also give a performance boost, because escaping input is fairly inefficient!

Again, this material is left here for informational purposes only (do not use it for this assignment).

The classic mitigation to this problem used to be validation in conjunction with something called "string escaping." Validation was performed using functions to test data for various properties, such as being integers, floating point numbers, or alphanumeric (perhaps with various allowed punctuation). This is one of the sources of strange restrictions on what characters can be used in passwords for various services.

If a string contains special characters, we need a way to inform a parser (in this case the SQL server) that the characters should be treated simply as regular characters, not as characters with special meaning in SQL. Traditionally, this is done by putting a backslash ( ) in front of the each special character. For example, in most languages, to include a double quote character ( " ) in a double-quoted string, we can usually use the following syntax:

escaped = "here is a double quote: \" and now i'll end the string"

In essence, the backslash says "don't end the string here -- just treat it as a regular double quote character".

Most languages have functions that will sanitize or "escape" strings being used in MySQL requests. In PHP, the function is mysqli_real_escape_string(foo). It will take the string foo and return an escaped version. For example, if we did the following:

$ssn_escaped = mysqli_real_escape_string($ssn);

... $ssn_escaped would contain:

\'\; DROP TABLE personal

The function mysqli_real_escape_string() will only function properly ''after'' a connection to a mysql server has been made. If mysqli_real_escape_string() seems to be returning empty strings, make sure that its invocations follow a connection to a mysql server.

Since the special characters are now escaped with the backslash, they'll be treated as regular letters -- not special SQL tokens: Thus, our original query would become:

SELECT * FROM personal WHERE ssn LIKE '\'\; DROP TABLE personal'

...and of course, there is no ssn like \'\; DROP TABLE personal. Running this query wouldn't drop the table, it would simply return an error.

Errors due to lack of input validation and sanitization happen constantly, not just with SQL. The only response is to write code with the least possible privilege and to perform comprehensive and correct validation (or sanitization) on all input.

For this assignment, you'll use Prepared Statements instead of string escaping. Read on!

Prepared Statements

Prepared statements are an improved approach for protecting against SQL injection that, by changing how SQL queries are constructed, make validation and sanitization part of the query process. Thanks to prepared statements, manual validation and sanitization are not necessary.

Prepared statements work by telling the database what a query will look like, using placeholders for inputs that are then bound to the query later. SQL injection works when the database can't differentiate between intended operations in the query and malicious input. Because prepared statements send the query and inputs to the database separately, SQL injection becomes completely mitigated with this approach. Also, since the database knows the types of the various table items, it can validate and escape data automatically -- minimizing the chance some edge case is overlooked and leads to a security leak. Prepared statements can also improve the performance of queries that are run multiple times. We will use the mysqli PHP library to use prepared statements.

We need to connect to our database before starting:

$mysqli = new mysqli("localhost", "username", "password", "database");

The first step is to create the statement. This is where we specify what our query will look like, with placeholders for the inputs. Looking at the example above, the only input is $ssn, so we can prepare the statement like this:

$stmt = $mysqli->prepare("SELECT * FROM personal WHERE ssn LIKE (?)"); 

The second step is binding the input to the query. We need to specify the expected type for each input (as a single string), along with each of the inputs. Types are specified with "d" for decimal, "i" for integer, "s" for string, and "b" for boolean. Assuming ssn is is a string:

$stmt->bind_param("s", $ssn);

If we had an additional parameter, such as age, we bind the parameters like this:

$stmt->bind_param("si", $ssn, $age);

The third step is to execute the query:

$stmt->execute();

If we want to run the query multiple times with different input, we only need to re-run steps two and three.

To read output from the statement, we use the get_result() method, which reads one result at a time.

$res = $stmt->get_result();

To get an associative array with the row names as keys:

$row = $res->fetch_assoc();

If we have multiple results, we can iterate over them like this:

$result = $stmt->get_result();
while($row = $result->fetch_assoc())

Then we can read the result:

printf("Name: %s, DoB: %s\n", $row['name'], $row['dob']);

Note: the existing code returns results as an array, while mysqli returns results as an associative array (sometimes called a dictionary or in Perl a hash).Arrays are indexed by integers, but associative arrays are indexed by keys, which can be arbitrary string values. So, with the old code you might do something like this:

$id = $row[0]; 

... but in the new code you would do something like this:

$id = $row['id'];

You can also embed associative arrays in strings by surrounding them with curly braces:

echo "The ID is: {$row['id']}";

The above examples are simplified -- make sure to check for errors! See the mysqli documentation for more complete examples.

Additional Reading on SQL Injection

For more information, see these articles:

... for additional information, search online -- numerous resources and SQL tutorials exist.

Software Tools

This section will describe some tools you may need to complete this exercise.

diff and patch: see differences and create source patches

In this exercise, you'll be fixing security vulnerabilities in a few simple programs. However, instead of your whole program, we only want the differences between your new, fixed, program, and the original. A file which contains only the changes between two revisions of a program is called a "patch." Fortunately, creating patch files for single-file source programs is easy.

To see the differences between two files on Unix, you use the diff utility:

$ diff one.txt two.txt

Another useful tool is called patch. patch takes properly-formatted diff output, and applies the changes to the original file. diff can generate this output with a few options:

$ diff -u oldcode.c newcode.c > fixed.patch

diff has many options to modify its behavior (see man diff for more information).

This above options for diff will create a patch with the filenames and all necessary information that the patch program requires. This makes patching as simple as executing:

$ patch oldcode.c -i fixed.patch -o new-patched-file.c

... and this will create a patched version of the program that you can test.

When submitting a patch file, it is highly recommended that you create the patch and then test it before submitting it to make sure that it works. You will not get any points for code that does not execute or compile in the exercise environment.

If you're having permissions problems, consider switching to root by executing sudo su - or change the permissions of the source directory in question.

mysql: command line mysql client

When attackers try to create SQL injection attacks, they often know very little about the the database schema. In our case, we have hands-on access to the database, so this should make the job of developing injection attacks easier. This is where the MySQL command-line client comes in.

To use the MySQL command line on the server, run a command like this:

$ mysql -uroot -pzubaz99

... root is the user and zubaz99 is the password -- and the lack of spaces is important.

This is essentially an SQL "shell" and gives you root access to the entire database. Once you get logged in to the database, you need to select the database to use. Then, you can make selections from the database:

MariaDB> use fccu; # selects fccu database
MariaDB> select * from accounts; # print all accounts information

Something that you may notice out of the ordinary is that the command line will prompt you with MariaDB instead of MySQL. Newer versions of Debian/Ubuntu are resorting to MariaDB instead of MySQL, which is a fully backwards compatible, community-made version of MySQL. MariaDB is faster and safer than MySQL, which is why MySQL was overwritten by MariaDB in newer versions of Ubuntu/Debian.

The name "MariaDB" in the command line will be the only change you will notice with this lab. All SQL commands will remain compatible, but this lab will continue to use mysql on the command line instead of mariadb.

If you scroll up, or limit the query, you'll see that mysql very nicely adds a title to each column -- this is the column name. So you could create a query like this to display all account information for accounts with an id greater than 50:

MariaDB> select * from accounts where id > 50;

There are many online SQL tutorials -- including some on SQL Injection, so we won't cover more here.

When trying to develop an SQL injection attack, you must consider the queries hard-coded into the script -- they are your starting point. You need to find a way to bend the query to your will -- you can't just start over with a query you like better. A good idea is to copy the SQL query from the script (or create one like it) into the mysql command line client and play around with changing it in controlled ways until you can get the result you want.

Assignment Instructions

You are the security administrator for FrobozzCo, a large corporation with a great many secrets. You have just come back from a much-needed four week vacation in West Shanbar, only to find that FrobozzCo has been having some serious security issues! In order to do everything you need, you've prepared a test environment on SPHERE with the necessary software installed.

Setup

    1. If you don't have an account, follow the instructions here.

    2. Create an instance of this exercise by following the instructions here, using sqli as Lab name. Your topology will look like below:

      .

    3. After setting up the lab, access your sqli node.

Make sure that you save your work as you go. See the instructions in the submission section of this exercise for information about save and restore scripts. Save any changes you make to the sourcecode, your patches, memos, etc. in your home directory.

You will probably want to set up port forwarding for tunelling HTTP over ssh so you can test the web applications with a browser on your own desktop.

Tasks

FrobozzCo Community Credit Union

FrobozzCo has its own internal company credit union, FrobozzCo Community Credit Union (FCCU). FCCU has an Internet-accessible web-based application that allows employees to access their paychecks and pay bills via a money wiring system. There are very few bank employees, and they use a a special administrative interface that runs on a different system that is not network accessible. In true FrobozzCo fashion, the public banking software was written in house by the CTO's nephew (who is a nice kid but not the brightest candle on the cake).

As it turns out, a lot of money has been disappearing from the credit union while you've been gone. It looks like someone has figured out how to force other accounts to wire money... to an anonymous bank account in the Cayman Islands! Worse yet, several employees have had serious identity theft problems of late -- clearly someone has access to personal information and you have a hunch it's all coming from this server. To top it all off, the company itself is showing a deficit of $32,767 and it looks like it was somehow drawn through FCCU.

In a surprising display of foresight, your predecessor installed a network monitor watching the FCCU server. However, you are shocked to find out (from the network monitor and server logs) that nobody has logged into the server directly -- in fact, the only interaction that anyone has had with the server has come through the Internet facing web interface. It looks like insecure software is to blame, again.

You assume that there must be one or more vulnerabilities in the code that interfaces with the SQL database -- in the FCCU software, the directory, or both -- and that somehow a malicious user is able to make the system do something it's not supposed to, like write checks. Worse yet, it seems like the attacker has managed to view private information like social security numbers, birthdates, and so on. You've heard about a class of attacks called "SQL Injection," and it seems likely that this is the kind of exploit being used.

Surprisingly, your boss agrees with you and instructs you to produce a one page memo, a detailed transcript demonstrating the exploit, and a patch for the software. Additionally, he also wants to know how to clean up this mess -- how severe is the compromise? How can we restore the system to a safe state?

Tasks

  1. Load your Exploits image in SPHERE.
  2. Exercise a remote SQL-Injection vulnerability to perform these unauthorized tasks on the SQL server:
    1. Show how you can log into a single account without knowing any id numbers ahead of time.
    2. Show how you can log into every account account (one at a time) without knowing any id numbers ahead of time.
    3. Make some account (your choice) wire its total balance to the bank with routing number: 314159265 and account number: 271828182845
    4. Explain why you can't create a new account or arbitrarily update account balances (or show that you can).
  3. Create an exploit transcript in the file /root/submission/exploit.txt, which should include your SQL injections (in order), short answers and any other information you think I should know.

This is not an executable script, but should be a step-by-step "walkthrough" of the attack that a colleague could follow without assistance.

  1. Fix the vulnerability in the FCCU application by adding input validation using prepared statements.
  2. Create a patch against the original source.
  3. Quoting as little sourcecode as possible, write a ~1 page memo, including:
    1. A description of the security flaw in the FCCU application
    2. A description of how you fixed the flaw. How does your fix solve the problem?
    3. Considering the FCCU application alone, describe a recovery plan for the server, answering:
      1. How serious was this breach? Could attackers gain root access through this vulnerability?
      2. What should be done with the server in order to secure it?
      3. Include any other observations or thoughts you might have.
  4. Store the following files in /root/submission:
    1. your memo
    2. your exploit walkthrough (exploit.txt)
    3. your patch
  5. Use the scripts described in the section for creating a submission tarball.

Submission Instructions

For this exercise, you will submit a tarball containing your patch, memo, and exploit code. Use the script submit.sh in /root on the server host for creating and restoring those tarballs.

submit.sh and restore.sh

Note: do not run submit.sh and restore.sh as sudo!

submit.sh will back up:

restore.sh will restore those files to their original locations, automatically overwriting whatever is there.

Note: do not run submit.sh and restore.sh as sudo!

Submit your tarball to your instructor.