CodeNewbie Community

rwalroth
rwalroth

Posted on

Threads, Locks, and Queues

Everyone at some point has a very dangerous thought: What if I used multithreading to speed up my app? This is dangerous because it now requires thinking about your code in a whole new way. Fortunately, python makes it very easy to get started with multithreading.

What is a thread in python?

One very important thing to keep in mind is what a thread actually is in the context of python. Modern CPUs are frequently multi-core architectures capable of executing code in parallel, and you may see this referred to as a PC executing multiple "threads". But in python, you can't actually execute two threads at the same time, because of something called the Global Interpreter Lock. However, sometimes your program might have a waiting step, and you could be executing some other code in the background. This is where threads can speed up your code. In other words, if you just want speed there might be ways to more efficiently organize your code and get the same speed boost that multithreading would give you. But what if there is a waiting step in your code? Now you have a golden opportunity to make your computer work while you wait.

Side note: You can have more than one python process execute using the multiprocessing module, but that's beyond the scope of this post.

How to think about threads

The challenge in creating threads is that now you no longer know the execution order of your program. For example, lets say my code looks like the following:

func1(a, b, c)

func2(d, e)

func3(a, b)
Enter fullscreen mode Exit fullscreen mode

These three functions will execute in the order that they are called. func2 will not start executing until func1 has finished. But, what if func1 takes a long time with lots of waiting steps? We can offload it to a thread, like so:

from threading import Thread

def func1(a, b, c)
   # Do stuff

thread = Thread(target=func1, args=(a, b, c)) # a, b, and c should be defined before this line

thread.start()

func2(d, e)

thread.join()

func3(a, b)
Enter fullscreen mode Exit fullscreen mode

Let's break that down. The Thread object is imported from the threading module, part of the standard library (docs). To create a thread, you define a function you want executed by a thread and set it to target, and then pass in some arguments. The start method calls the function, and begins execution. You then call func2, and that's it! You now have two threads running.

Now, here's where it gets tricky. Notice that func3 also takes variables a and b. Assuming these are modified by func1, if you just call func3 right after func2 is done it's possible that func1 will still be running. If you put all three into threads, you have no idea when what is called. If a and b are meant to be constants that's fine (here is where some of python's most glaring flaws turn up, the lack of constants and the passing of all arguments by object reference). But what if you want a and b to be modified? That is where the join() method comes in. When called, it blocks execution of that thread until the other thread is finished. That way, you make sure that func1 finishes before func3 starts.

Long lived threads

The previous code is great for simple functions, and there are more complicated things like thread pools that you can play with. But I want to instead get into a more complicated case, having two long lived threads that need to talk to each other. For example, what if you want to get user input while also executing some complicated tasks? You don't want to freeze the heavy background computation while waiting for input, and you don't want the user input to feel stuck because the background process is taking a long time. Especially in GUI programming, you will want to offload as much as possible to side threads. Or if you have some asynchronous code that needs to wait for inputs from different sources. So how do you "talk" to a thread? Some frameworks, like Qt, have signal/slot mechanisms for passing info back and forth. But outside of these, you can use queues. Here's a simple example:

def thread_target(commandQueue):
    while True:
        if !commandQueue.empty():
            command = commandQueue.get()
            if command == "quit":
                break
        # Do other stuff
Enter fullscreen mode Exit fullscreen mode

The above function would serve as a target for a thread object. Here, you would hand over a Queue object, imported from the queue module:

from queue import Queue

my_queue = Queue()
thread = Thread(target=thread_target, args=(my_queue,))
thread.start()
my_queue.put("quit")
thread.join()
Enter fullscreen mode Exit fullscreen mode

Queues are like pipes connecting threads, and bring a few special properties. In this example, I am handing off different commands to a queue, and letting the queue check for new commands on each run through its main loop. You can also have a backwards queue for a thread to dump results of calculations as needed. To add an item to a queue, just use queue.put(val). To get the next item in a queue, call queue.get(). That's it! One very important thing to note: anyone with the queue can add and take from it. It's always good to make sure each queue only has one consumer. In other words, only one thread should be tasked with calling get() from any one queue otherwise your program will get unpredictable. Remember, controlling the flow of execution is one of the most important parts of software design, and in multithreading you no longer know what order things will happen! You can't know who put what command in a queue, and you can't know who it was meant for.

Queues can take more than just strings, they can take just about any python object. I typically use a dict, which contains some information on what the data is and what I want the thread to do with it.

Locks

Now lets get very complicated, and think about thread safety. Thread safe refers to objects that can be safely shared between threads, without fear of two threads trying to modify them at the same time. Queues are essentially just thread safe lists that make sure only one thread at a time modifies them. But lets say we want to make our own thread safe object, a string that we can safely append to or change.

from threading import Lock
from copy import deepcopy

class TSString:

    def __init__(self, val=None):
        self._data = val
        self._lock = Lock()

    def get_data(self):
        with self._lock:
            out = deepcopy(self._data)
        return out

    def set_data(self, newVal):
        with self._lock:
            self._data = newVal

    def append_data(self, data):
        with self._lock:
            self._data += data
Enter fullscreen mode Exit fullscreen mode

A Lock object can be thought of as a boolean value, if one thread grabs it than no other threads can. Everything that executes within the with lock: will hang on to the lock, which is released at the end. In the above example if two threads try to call any of the getters or setters in the object then only one will be able to at a time. The other will wait until the lock is acquired. Two things are important to note here. First, the _data attribute has the underscore to let others know it is private, but python does not protect any attributes. In other words, you can completely bypass a lock by just modifying _data directly! As soon as you do, the object is no longer thread safe. Second, because you have no way of guaranteeing order of access, you should be careful how you use this pattern. There might be cases where you just want to check how a variable has changed without wanting a queue that dumps an entire object. Just remember that as soon as the data is accessed you can no longer know if it's still in that state.

Multithreaded thinking

I first used threads two years ago, and now I find myself incorporating them all the time. The biggest hurdle was in rethinking how I think about code, especially flow of execution. In any good coding tutorial, much time will be spent on how to use loops, conditionals, and iterators to control the flow of your program. I spend a decent chunk of time reading through my code and making sure the data will change how I think it will (and then even more time debugging when I realize some conditional is not triggering as it should). When you have a thread, doing that becomes much more difficult. Each thread needs to be debugged on its own, and you have to make sure the points where two threads intersect are carefully managed. Threads are not a silver bullet to faster code, they are a powerful tool for specific needs.

I hope this was helpful for anyone thinking of starting out with threads, and I will again point you to the docs. Let me know what you think!

Discussion (0)