Why bark when you have a dog? Using libraries to add security and reliability plus avoid SQL injection attacks

Woof Woof!

Why Bark when you have a dog?

This article is about how to boost security and avoid SQL injection attacks with an example showing how development with SQL databases such as PostgreSQL and MySQL can be easier and more robust when using the right tools.

Sometimes it's easier to just get things done, for example inserting a record into a database, we're using a php fragment here but most languages are affected in a similar fashion.

// MySQL
$link = mysqli_connect("localhost", "username", "password", "dbname");
// PostgreSQL
$link = pg_connect("host=localhost user=username password=password dbname=dbname");

$location = 'London';
$query = "INSERT INTO location (name) VALUES($location);"
if (!mysqli_query($link, $query)) {
    printf("Error: %s\n", mysqli_sqlstate($link));
}

This works with location = London.  If the place were Hadrian's Wall however this would fail unless we escaped the quote in some way

In MySQL adding slashes works so it would be easy (if wrong) to do this.
$location = 'Hadrian\'s Wall';

In PostgreSQL for example doubling the quotes is the preferred way. It also works in MySQL, Oracle, MSSql, SQLite and many others, but don't do that either.
$location = "Hadrian''s Wall";

So why is this wrong when you have decided on MySQL or PostgreSQL? You're unlikely to change databases so this is easier than adding the following additional code.

Use native database quoting:
$location = mysqli_real_escape_string($link, "Hadrian's Wall");
$location = pg_escape_string($link, "Hadrian's Wall");

Using the right database library brings the following benefits:

  • Maintain database independence;
  • Reduces mistakes as often more is escaped than just quotes;
  • Setting a good example, fragments of code are often copied and adjusted as necessary;
  • Clarity of communication.

Once a project has started with a coding style or set of libraries, it is often impractical to change usually because the project is maintained by less experienced staff and the general inertia of a successful project.  Fear is also a factor, usually the fragility of a project is recognised and so any change is harder without a good understanding of the many factors involved.

Perhaps a real world example might clarify how a project can get into or avoid difficulty.

In order to test for a string using a regular expression (regex), the following text was used as part of a query string
"src\\s?=\\s?[\\'\"]http:"

By not adding slashes it is easier to decipher the regex.
'src\s?=\s?[\'\"]http:'

The deciphered regex.
src\s?=\s?['"]http:

In English, Test for a HTML tag with a source attribute of http: using either single ' or double " quotes and allowing for space before and/or after the equal sign.

So why all the confusion and how did we get here anyway?

Firstly too many things are being done at once, adding slashes in php to escape slashes and quotes, in addition to leaving some slashes as part of the regex.  If there is something to learn then do one thing at a time and do it well.

Secondly, and this is subtler and more of a cultural thing, habits are hard to change.  By doing easy things, it's easy to get into the habit of doing them even when things get harder as in this example; by starting with good habits and using library functions where appropriate it's easier to keep things simple and maintain easier to read code both for ourselves and those who will maintain or improve the code at a future date long after we are gone.

If you were in a rush and only one database to code for then adding slashes for mysql would produce "src\\s?=\\s?[\\'\"]http:", it would work and get the job done, unfortunately it would also make it much harder to either understand what the regex does or to modify it to be database agnostic.

Far better to use a native database function to escape the string and extract the different ways of querying the database into a seperate library or class.  For convenience the two queries are shown below for comparison.

The regex is src\s?=\s?['"]http:

Because of the use of both single and double quotes this cannot easily be represented as a single string and so here the single and double quotes are extracted as variables for clarity.

// src\s?=\s?['"]http:
$quote = "'";
$doublequote = '"';
$regex = 'src\s?=\s?['.$quote.$doublequote.']http:';

// MySQL
$query = 'htmlcache REGEXP ' . $quote . mysqli_real_escape_string($link, $regex) . $quote ;

// PostgreSQL
$query = 'htmlcache' . " ~* " .$quote. pg_escape_string($link, $regex) . $quote ;

By using library functions:

  • the code is more readable - Clarity of communication;
  • database independence is maintained;
  • bugs and improvements are easier to fix/implement;
  • reduced expertise required as often more is escaped than just quotes;
  • by setting good examples early on, they are likely to be maintained as fragments of code are often copied and adjusted.

Adding slashes just to 'get it to work' is almost always a bad idea.

The cost of maintaining a project often outweighs the starting cost, as a rule of thumb by a factor of ten.  By starting a project and using library functions where appropriate, the cost of developing a project can be considerably reduced along with reduced maintenance and increased flexibility.  Doing it Right First Time is a good habit to get into.  Just some of the many ways to do more with less.

by Alan Hicks