Zines

Dev blog

Processing Videos and Serving Requests Concurrently in a Bottle Application with Threads

Intro

Like many people, I have a lot of videos stored on various devices across my home network. Sometimes I want to watch one of these videos on a different device, and my solution for this for a long time had been to just transfer the files from one computer to another using a USB drive. This got old, so I decided to centralize my video storage onto an old Raspberry Pi and develop a small application for serving these video files to any client on my home network, including my television’s built-in web browser (just like Tim Berners-Lee intended).

There are out-of-the-box solutions for this kind of problem, like Plex, which I could have used and thereby avoided some of the obstacles I came across in implementing this application. However, I was more interested in implementing my own solution to gain some insight into how web-based video streaming applications work at least at a rudimentary level, and because such challenges make for an interesting way to pass the time.

Technical Overview

I decided to use bottle.py for my backend, due to its small footprint and suitability for rapid prototyping. Some HTML templates are included in a public directory, and the application itself runs from a single file, app.py. Nginx serves as a reverse proxy, and (detailed later) serves my video content. I use gunicorn for a WSGI gateway, running the application with gunicorn app:app -w 1 --threads 2 -b localhost:5000. This provides one worker with two threads, ideal for demonstration purposes (but definitely not ideal in a different context).

After importing our relevant modules, we define some routes. Included in these routes is /video/<filename>, which sends a simple template to the client containing a <video> and <source> tag with a src attribute pointing to the route /videos/<filename>:

@route('/video/<filename:path>')
def video_page(filename):
    return template('<video controls><source type="video/mp4" src="/videos/{{filename}}"/></video>', filename=filename)

# serve a single video to the browser (not used in production, for prototyping only)
@route('/videos/<filename:path>')
def video(filename):
    return static_file(filename, root=VIDEOS_DIRECTORY)```

Also included is a route for uploading files, /upload, which uses a simple HTML form to pass files to the server. At first, this route simply grabbed any uploaded file in the POST request, checked if it was an allowed file type, and saved it if so:

@route('/upload', method=['POST', 'GET'])
def upload():
    if request.method == 'POST':
        f = request.files.get('upload')
        name, ext = os.path.splitext(f.filename)
        if ext in ['.webm’, ‘.mp4’]:
            f.save(VIDEOS_DIRECTORY)
            return HTTPResponse(status=200, body='<h1>File uploaded</h1>')

    return static_file('upload.html', root='public')

Asynchronous Video Processing and Other Improvements

Sometimes the videos I want to upload are encoded with media container formats, like Matroshka (.mkv), that are not supported by web browsers. In this case, I’ll want to upload the file and change the container format. For this, I’ll use a system call to ffmpeg to copy the video to a web-friendly format, like MPEG-4 (.mp4).

Unfortunately, Python web applications are synchronous by default, and as such, workers can be held up processing these kinds of requests. This means, assuming you have only one worker, if you upload a file to your server and perform some operation on this file (such as changing the file type), your application will not be able to respond to further requests until the operation is completed. So if you decide to upload a large file, and then try to navigate to the site’s index, you’ll have to wait until the video is uploaded, processed, and saved before you’re able to access the application again, as the app is too busy handling the operation initiated by the previous request and still has yet to return a response to that client.

In order to upload and process a file while continuing to serve requests, the two operations should be handled concurrently. There are many ways to handle concurrent programming in Python, but I’ll be using the native threading library for this task.

Implementing threads in a Bottle application

Threads are easy to implement in Python. Simply import the threading library and define a task:

import threading
…
def handleUpload(f):
    name, ext = os.path.splitext(f.filename)
    if ext in [‘.mkv’, ‘.webm’, ‘.mp4’]:
    f.save(VIDEOS_DIRECTORY)
    if ext in ['.mkv']:
            print("Beginning conversion")
            os.system('ffmpeg -i '+VIDEOS_DIRECTORY+name+ext+' -codec copy '+VIDEOS_DIRECTORY+name+'.mp4')
            print("Conversion complete")

@route(‘/upload’, method=['POST', 'GET'])
def upload():
    if request.method == 'POST':
        f = request.files.get('upload')
        task_thread = threading.Thread(target=handleUpload, args=(f,), daemon=True)
        task_thread.start()
        return HTTPResponse(status=202, body=...)

    return static_file('upload.html', root='public')

Our handleUpload function is passed f (representing a file) as an argument and begins to run concurrently with the main thread when the file upload is complete. The user is then informed that the file is processing, with the operation continuing in the background. Because the process is running concurrently on a separate thread, the user is able to make requests to the application’s main thread while the video processes, without having to provide more than one worker.

I use a system call to ffmpeg for simplicity’s sake. There are safer ways to do this, like the use of the ffmpeg wrapper library for Python. Input validation and sanitization would be necessary to make this safe for use beyond my own production context.

As a disclaimer, threads may not be the best suited tool for the task in a different production context. Elsewhere, it may make more sense to use coroutines rather than threads. Threads in Python are handled concurrently but not necessarily on separate cores. In fact, with CPython, only one thread will be handled at a time anyway due to the Global Interpreter Lock (GIL). This makes threads more expensive than coroutines without much of an apparent benefit in this case. This is still okay for my purposes, but it might not scale like coroutines will. Because this project is running on an old Raspberry Pi on my home network, and is not intended to be accessed from the public internet or by more than potentially two users at a time, I was not overly concerned about scaling.

Serving content with Nginx

Serving static content from within the application itself works fine in the prototyping stage, but in production (in my case, a Raspberry Pi on my home network), the performance is subpar. After successfully implementing threaded video processing, I went to watch a previously uploaded video, and was dismayed by the periodic lag the application was facing while streaming, even while no processing was occurring in the background.

A cheap way to improve performance is to stop serving static content (like videos) from the Python application itself, shifting this burden to a specialized web server. Incidentally, I am already using Nginx for this project, so some added configuration is all that’s necessary to make this happen.

The simplest way to serve static files through Nginx is by defining an appropriate location within the site’s configuration file and issuing the alias directive along with the directory from which files are being served:

location /videos/ {
    alias /some/directory/;
}

And that’s it. There are ways to further optimize the transfer of video data to web browsers as well as to implement features like seeking via query parameters, such as by using Nginx’s mp4 module, but for the purposes of streaming over my local network, this accomplished the task just fine. This lets me disable the /videos/<filename> route and offload the task of serving files from the application to the Nginx server. Video streaming is now seamless, with no buffering interrupting playback.

The full project can be found here, but keep in mind that the user interface is still very utilitarian. Future updates are forthcoming, which I hope to chronicle on this blog.