Optimizing Django ORM Queries for Best Performance

By: summonagus ● at Aug. 25, 2017, 9:34 p.m. ● and modified at Aug. 28, 2017, 7:39 p.m.
Posted under: #Tricks, #Problem, #Database, #Django, #Solution ● Your ip address: 54.224.184.33 ● views: 37 times.

AD Space Available, Book Now!

Django framework already comes with lot of ORM (Object Relational Mapping) functionalities. Once the models are created, you can create, update, retrieve and delete objects using its database abstraction api. Any changes in the models can also be handled using migrations which is pretty awesome. Making queries in django is very simple and straight forward, but it can cause performance issues if the queries are not written properly. Using the techniques listed below, I cut down nearly 92% database queries and around 80% query execution time. I will show and discuss the statistics in detail below but first lets go through the database access optimization techniques.


Understand how Querysets work

One of the important things you need to understand is that django querysets are lazy i.e. they won’t be evaluated unless you ask for the data. Also, if you try to read the query-set again, the results will come from the fetched values and there won’t be any database hits.

For example:

# No database interaction will be done here
blogs = Blog.objects.filter(category='django')
    
# The below statement will fire the database query if blogs: print 'yes'
# No database hit randomBlog = blogs[0]

Given that you have good understanding of querysets and making queries, following are some ways to drastically improve django orm performance.


Use select_related or prefetch_related

Retrieve everything at once if you know you will need it. Most of the model definitions contain foreign key relationships. If the queries are not written properly, data will be fetched in multiple hits on the database which should be avoided. This is particularly important if you have a query that is executed in a loop, and could therefore end up doing many database queries, when only one was needed.


Lets take a use case when you have to query the table with foreign keys and also need the data referenced via foreign key. Assuming the following models definitions

class Blog(models.Model):
    title = models.CharField(max_length=32)
    description = models.CharField(max_length=255)
    author = models.ForeignKey(Author)

class Author(models.Model):
    name = models.CharField(max_length=32)
    email = models.EmailField(unique=True)
    city = models.CharField(max_length=32)

# Hits the database
blog = Blog.objects.get(id=124)

# hits the database again
author = blog.author

To optimize the database hits, we can use select_related() modifier which follows the foreign key relationships and fetches the result in a single complex query. You can refer any Foreign Key or a list of foreign keys and pass it to select_related(). Below is the example illustrating the effect.

# Hits the database
blog = Blog.objects.select_related('author').get(id=124)

# doesn't hit the database as value is already fetched in the previous query author = blog.author

Apart from select_related, there is a similar method prefetch_related(). Both these methods have similar purpose but the strategy to retrieve data is different.


select_related performs a sql join while prefetch_related does a separate lookup which allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related. prefetch_related can also be used with generic relations and generic foreign keys. You can read more about them here.


When to use count vs len vs exists

There can be cases where you want to know if particular records exists in the database, number such records or full data. There are various methods to get the correct result, however we should always be aware as to which method works best for which situation. Lets take these use cases with example


If you want to only know if there a blog exists with category ‘django’, then use exists instead of len or count.

# Bad query
blogs = Blog.objects.filter(category='django')  
exists = len(blogs)>0

# Bad Query
exists = Blog.objects.filter(category='django').count() > 0

# Good query
exists = Blog.objects.filter(category='django').exists()

Now if you want to only know the count of blogs with category as ‘django’ and you are not bothered about the content, then;

# Bad query
blogs = Blog.objects.filter(category='django')  
count = len(blogs)

# Good Query
count = Blog.objects.filter(category='django').count()

However if you need the data of blogs as well, then its better to get full data and use len(blogs) whenever you require the number.


Use foreign key values directly

If you only want to know the id of the foreign key, then its better to use fk__id instead of fk.id. This will save additional database lookup for contents of child table. For example, if you want to know the id of author of a particular blog

# Bad query, additional db lookup in author table
blog = Blog.objects.get(id=2)
author_id = blog.author.id  

# A better version of above query but still a bad query
blog = Blog.objects.select_related('author').get(id=2)
author_id = blog.author.id  

# Good Query
blog = Blog.objects.get(id=2)
author_id = blog.author_id

Don’t fetch data you don’t need

If you know what data you require, then there is no need to retrieve all columns of the table, some of the ways to achieve this are listed below

Use values or values_list

Using queryset.values() or values_list() can be very useful if you have tables with lot of columns and you just want the dict or list of values of specific columns.


Consider a use case when you want to retrieve blog title and author name of all blogs with category django. Using values or values_list can significantly cut down the data transfer required from the database and can make queries run faster.

# Bad query
blogs = Blog.objects.filter(category='django').select_related('author')
mylist = []
for blog in blogs:
   mylist.append({'title':blog.title, 'author_name':blog.author.name}) 

# Good Query
mylist = Blog.objects.filter(category='django').values('title', 'author__name')

Use Queryset.only and defer

Using queryset.only() or defer() in your queries can also improve the query performance. If you want only specific columns from the database, you can use only parameter. Or if you don’t want some columns, you can remove them using defer parameter. More details here. Following example illustrates the same.

# If you want only author and blog title 

# Bad query, this will retrieve all details
blogs = Blog.objects.filter(category='django').select_related('author')

# Good Query
blogs = Blog.objects.filter(category='django').only('author','title').select_related('author')

# If you want everything apart from description

# Bad query, this will retrieve all details
blogs = Blog.objects.filter(category='django').select_related('author')

# Good Query
blogs = Blog.objects.filter(category='django').defer('description').select_related('author')

Analyze and index your tables

Apart from the various techniques to improve the ORM performance, one of the most important thing you need to do is understand your data. You should have a solid understanding of your table structures and type of queries used on that table. Once you know that, do the proper indexing of appropriate columns. Keep in mind that Indexing slows down the write operations, so you need to factor in those things as well and then come to conclusion.


Install django-debug-toolbar/analyze ajax calls

A very powerful tool to debug your sql queries is django-debug-toolbar which shows the sql queries executed by the database api along with the time each query took to execute. This is a very helpful tool to analyze which calls are taking too much time/ executing lot of queries. It can also be used with ajax calls by installing an additional app django-debug-panel and a chrome extension. To install both of them, follow the steps mentioned below

pip install django-debug-toolbar
pip install django-debug-panel

# Add these to installed apps 
INSTALLED_APPS = (
    .....
    'debug_toolbar',
    'debug_panel',
    .....
)

# Add a middleware class
MIDDLEWARE_CLASSES += ('debug_panel.middleware.DebugPanelMiddleware',)

# Install this chrome extension
chrome extension

In one of my calls, there were about 305 queries happening which was taking 23.2 seconds of execution time.

Performance before django orm query optimization


Performance after django orm query optimization


This is huge gain in performance as the execution time went down from 26.2 seconds to 3.8 seconds (85.4% reduction). The main culprit was foreign key access within a loop which was firing lot of queries. This statistic is for a small development database. In production, the number went down from 3.6 seconds to 0.65 seconds (82% reduction).


Keep in mind that sometimes it might actually be faster to do some extra queries vs using joins or doing complex queries, but its worth checking the query time and improving on it.


I hope you find this article helpful. Let me know if you have any suggestions/ feedback in the comments section below.


Fun Fact: Game of thrones season 6 is back, and its episode 4 is also titled as the book of stranger.


Original Source: http://bookofstranger.com/optimizing-django-orm-qu...

Author

summonagus

Python Developer, Linux Enthusiast, and Senior writer at python.web.id
Website: https://github.com/agusmakmun
blog comments powered by Disqus