Serving Images from the DB in Django

5 minute read

I recently got to work on prototyping a small Django-based website that was somewhat like a message board. One of the features requested was allowing users to upload a profile picture, which necessitated storing said pictures somewhere. Django is very opinionated about storing such files in a filesystem, but I prefer storing them in a DB. Here are my reasons:

  1. These files aren’t very big, and modern DBs are perfectly capable of handling blob-type fields
  2. I already have a DB set up. Setting another storage directory that is shared between the servers necessiates some more work that involves creating more objects, permissions, monitoring…
  3. My DB alreday has a backup plan set up, and I like having a single file representing a consistent state of my site
  4. SQL’s cascading deletes allow me to ensure I have no orphaned files

Because the framework assumes files are stored on the filesystem, I had to engineer some things on my own, and here they are:

The model

I prefer storing files in a their own table. Although the current relation is 1 picture per user (1:1 relation), keeping things in a separate table lets me keep things flexible for later (profile collection? versioning? different sizes?). In addition to the actual data, the file table contains some metadata that’ll be useful for later:

from django.db import models

class ProfilePicture(models.Model):
  user = models.OneToOneField(User, on_delete=models.CASCADE, related_name="profile_picture", db_index=True)
  last_updated = models.DateTimeField(auto_now=True)
  mime_type = models.CharField(max_length=50)
  data = models.BinaryField()

Things to note:

  • data is the obvious field - it contains the picture itself in binary form
  • mime_type is saved separately. Some image serialization schemes (e.g. base64 urls in JavaScript) keep the mimetype as a sort of header, but in the spirit of keeping things simple, I’m just splurging on another column
  • last_updated is both useful as generic server metadata (in fact, it’s considered best practice to add this field to your objects), but for image serving it’s extra useful since it allows work with the browser’s cache (see later)
    • auto_now sets it to NOW() on every update, which is just the behavior we want
  • the key is saved on this table to keep the ON DELETE CASCADE in the right direction - we want to delete pictures when a user is deleted, but we don’t want to delete a user when a picture is deleted.
    • Defining the related_name directly saves on guesswork later
    • We want to enable db_index, since 99% of the SELECTs for this table will look for a picture for a specific user

Setting

mime_type = uploaded_file.content_type
data = uploaded_file.read()
user = ...

ProfilePicture(user=user, mime_type=mime_type, data=data).save()

Note we’re not validating the image the way ImageField does, but we can add it manually if needed. It’s done with PIL.

Displaying

Here we have two interesting gotchas:

  1. We want to default to a ready placeholder if the user has no profile picture for some reason, rather than 404-ing (or something worse).
  2. We want to utilize the browser’s cache, and avoid sending the picture if the browser has a recent copy of it.

While I usually like Django’s class-based views, I ended up rewriting it as a function since it reads easier.

from django.shortcuts import redirect
from django.templatetags.static import static

def profile_picture(request, pk):
  query = ProfilePicture.objects.filter(uesr__pk=pk).defer('data')
  try:
    picture = query.get()
  except ProfilePicture.DoesNotExist:
    return redirect(static('missing.png'))

  def last_modified(request, *args, **kwargs):
    return picture,last_updated

  @condition(last_modified_func=last_modified)
  def _get(request):
    response = HttpResponse(content_type=picture.mime_type)
    response.write(picture.data)
    return response

  return _get(request)

Things to note here:

  • In a classic “it’s better to ask forgiveness than permission”, we’re hitting the query’s get() and handling the error, rather than checking with exists first. If we get a “no such object” error, we redirect to the static url of the placeholder and leave it. We can now assume we have a picture going forward
  • The condition decorator requires a function that calculates the request’s “last modified” property. It uses the result in two ways:
    • Determine a request’s cache freshness (comparing to the If-Modified-Since header on the request)
    • Set the cache’s date header for next time, if the cache is determined stale (setting the Last-Modified header on the response)
  • In the sake of minimizing DB queries, I’m reading the object from the DB at the start of the request. Since @condition is asking for a function for last_modified, I’m creating a dummy one that just returns the field from the object
  • To utilize @condition which is designed as a decorator, I’m creating a tiny function for it to decorate, then calling that
  • Since data is a big field and is needed only when the request is determined not to be stale, I’m using defer to make it lazily load

The above script works. It bails out early when there’s no relevant picture, handles the browser’s cache (returning 304 not modified when possible) and otherwise writes the binary data from the DB to the browser.

Linking

To get to the view, urls.py looks like this:

...
  path('profile/<pk>', views.profile_picture, name='profile_picture')
...

And inside a template, you can use it like that:

<img src="{% url 'profile_picture' pk=user.pk %}" />

Future thoughts

I’m not implementing the ETag, as I think that Last-Modified is enough. If you’d like to implement it, you’ll need to add a hash field to the model, calculate it (maybe using GeneratedField), and then create a small function in the view to expose it.

As your site sacles, you might notice that the write path is not completely optimized:

  1. Django model fields don’t support streams, so the entire picture is loaded to userland memory, and then written to kernel memory (as part of response.write)
  2. Browsers like parallelizing requests, and while web servers can scale out seamlessly, DBs are less easy and might choke on the many different requests
  3. While having full control on every request (e.g. requiring authentication) is nice, we might allow some laxity in image requests (e.g. if the url for the image is random enough, we can reuse them without being worried about enumeration)

If you run into this kind of performance issues, it might be time to invest in a fancier solution like an S3 bucket with presigned URLs.
On the other hand, you could just slap a bandaid on it in the shape of a CDN.