Searching for emails an exercise in optimising Django and PostgreSQL
 
After reading this article you should be able to improve email search efficiency using Django, including:
- Understanding why emails are case sensitive;
- How to inspect SQL generated by Django;
- How to add case insensitive lookups to Django QuerySets;
- How to automate adding an index with Django's built in migrations framework;
- How to check the query sent to the database is working as expected.
Doing the right thing improves usability and avoids surprises. Using Django with PostgreSQL makes doing the right thing easier and faster including bespoke customisation. Developer productivity matters.
Are emails case sensitive?
Resources
Yes, at least the local part is, the domain is case insensitive. This means that alan@example.com is not the same as Alan@example.com, although the following are equal alan@example.com and alan@Example.com.
This also means you should store the original case of an email address and not a lower (or upper) case version. It may appear to work if you test it on your own email server, and if your own server is implemented in a case insensitive way, but this isn't the standard and will break those that conform to the standard and have mixed case addresses.
Many databases including PostgreSQL and Oracle are case sensitive meaning that Alan is not the same as alan. By the end of this article you should be able to update a Django view and the PostgreSQL data store to successfully perform a case insensitive search for an email.
Inspecting Django generated SQL
Resources
Before changing anything, it's useful to inspect the SQL that Django generates so we can check Django is generating the SQL we expect, and to use the generated SQL to test the query plan.
By adding 'loggers': {'django.db': {'level': 'DEBUG'}} to settings.py views will output any queries sent to the database.
LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'simple': {
            'format': '%(levelname)s %(message)s'
        },
    },
    'handlers': {
        'console':{
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'formatter': 'simple'
        },
    },
    'loggers': {
        'django.db': {
            'handlers': ['console'],
            'level': 'DEBUG',
        },
    }
}
The resulting output from a page view might be something like the following:
DEBUG (0.020) SELECT "example_info"."id", "example_info"."name", "example_info"."email" FROM "example_info" WHERE ("example_info"."email" = 'ahicks@p-o.co.uk') ORDER BY "example_info"."email" DESC; args=('ahicks@p-o.co.uk')
Using psql we can use EXPLAIN to check the query plan:
QUERY PLAN -------------------------------------------------------------------- Sort (cost=6.62..6.63 rows=1 width=38) Sort Key: example_info.email DESC -> Seq Scan on example_info (cost=0.00..3.49 rows=1 width=38) Filter: ((email)::text ~~ 'ahicks@p-o.co.uk'::text)
We have successfully inspected Django to show the SQL the view is using to fetch our email address.
How to add case insensitive lookups to Django QuerySets
Resources
Views are where Django turns requests into responses, using QuerySets to ask the database questions then returning results for display. It's very efficient. Views are also flexible when we want to do more than the basics, in our case to perform a lower case email search.
First import the following:
from django.db.models import CharField from django.db.models.functions import Lower
Then in the view, register the Lower search type allowing an easy way to perform a lower case search, then use the search type to find an email:
CharField.register_lookup(Lower) list = ExampleInfo.objects.filter(email__lower=search)
We can see the SQL generated now includes LOWER("example_info"."email")
DEBUG (0.020) SELECT "example_info"."id", "example_info"."name", "example_info"."email" FROM "example_info" WHERE (LOWER("example_info"."email") = 'ahicks@p-o.co.uk') ORDER BY "example_info"."email" DESC; args=('ahicks@p-o.co.uk')
Using Django's built in migrations framework to automate adding an index
Resources
Django's migrations framework is straightforward to manage index creation.
It's essential to automate index creation in order to be repeatable for testing, staging and production.
The two manage.py subcommands are makemigrations and migrate.
First create a blank migration:
manage.py makemigrations --empty
Then add the sql to add the index and reverse sql to drop it. It's important to name the index carefully so that it's unique and easy to identify.
example/migrations/0002_auto_20201104_1320.py
from django.db import migrations
class Migration(migrations.Migration): dependencies = [ ('example', '0001_initial'), ] operations = [ migrations.RunSQL( sql=r'CREATE INDEX "example_info_email_lower_idx" ON "example_info" (LOWER("email"));', reverse_sql=r'DROP INDEX "example_info_email_lower_idx";' ), ]
We can see the SQL to successfully create the index:
DEBUG (0.673) CREATE INDEX "example_info_email_lower_idx" ON "example_info" (LOWER("email"));; args=None
Check the database query path is working as expected
It's essential to check that the index will get used by the query path.
Inspecting the database using the same SQL with real data can reveal what the database is doing.
Using psql we can check the query plan using EXPLAIN.
Before:
QUERY PLAN -------------------------------------------------------------------- Sort (cost=6.62..6.63 rows=1 width=38) Sort Key: example_info.email DESC -> Seq Scan on example_info (cost=0.00..3.49 rows=1 width=38) Filter: ((email)::text ~~ 'ahicks@p-o.co.uk'::text)
After:
QUERY PLAN -------------------------------------------------------------------- Sort (cost=6.41..6.42 rows=1 width=98) Sort Key: example_info.email DESC -> Seq Scan on example_info (cost=0.00..3.27 rows=1 width=98) Filter: (lower((email)::text) = 'ahicks@p-o.co.uk'::text)
But there's no difference, the query plan isn't using the index!
It's important to test with real data, otherwise the database query planner will use test data, and if there's insufficient data to make using an index cheaper, then it won't use it.
In PostgreSQL the Planner Method Configuration can be influenced, and as we can see after asking the planner to avoid a sequential scan the plan now uses the new index:
set enable_seqscan=false;
QUERY PLAN
------------------------------------------------------------------------------------------------ Sort (cost=6.41..6.42 rows=1 width=98) Sort Key: example_info.email DESC -> Bitmap Heap Scan on example_info (cost=4.30..6.60 rows=20 width=0) Recheck Cond: (lower((email)::text) = 'ahicks@p-o.co.uk'::text) -> Bitmap Index Scan on example_info_email_lower_idx (cost=0.00..4.29 rows=20 width=0) Index Cond: (lower((email)::text) = 'ahicks@p-o.co.uk'::text)
Here the planner has decided to use a two-step plan: the child plan node visits an index to find the locations of rows matching the index condition, and then the upper plan node actually fetches those rows from the table itself. Fetching rows separately is much more expensive than reading them sequentially, but because not all the pages of the table have to be visited, this is still cheaper than a sequential scan. (The reason for using two plan levels is that the upper plan node sorts the row locations identified by the index into physical order before reading them, to minimize the cost of separate fetches. The “bitmap” mentioned in the node names is the mechanism that does the sorting.)
You should now be able to improve user experience and email search efficiency using Django, including: Understanding why emails are case sensitive; How to inspect SQL generated by Django; adding case insensitive lookups to Django QuerySets; automating adding an index with Django's built in migrations framework; and checking the query sent to the database is working as expected.
Doing the right thing improves usability and avoids surprises. Using Django with PostgreSQL makes doing the right thing easier and faster including bespoke customisation. Developer productivity matters.