Multitasking

Introduction

It’s all about doing multiple things at the same time.

Multitasking involves handling multiple task simultaneously. By understanding multitasking, you can optimize software to leverage available hardware efficiently.

In web servers

Traditionally, a server handles one request at a time, leaving others waiting. To scale, servers spawn multiple copies, each still handling only one request at a time.

In the early 2000s, as web traffic surged, engineers encountered the C10K problem problem, requiring solution to handle 10000 concurrent requests efficiently. They explored two options:

Asynchronous I/O: utilize single-threaded asynchronous I/O, relying on Operating System support to trigger I/O operations, and notify later once completed, allowing serving others simultaneously.
Multi-threading: Serve one client per thread, but with increased resource consumption, as each thread allocates a portion of memory for its stack, and spends some CPU cycles in context switching, which in the hardware of that era was a matter of big concern.

Projects like NGINX, Node.js, and Twisted emerged, implementing the asynchronous I/O approach.

Options

Let’s explore multitasking options in Python.

⚙️ CPU-bound functions: Involve mathematical operations or iterating over large data sets.

def cpu_bound_function():
  i = 0
  while i < 999_999_999:
    i += 1
  return i

🌐 I/O-bound functions: Involve reading from or writing to a file, network, or database.

def io_bound_function():
  response = requests.get("https://veryslowsite.com/")
  return response

Threads

Good for 🌐 I/O-bound functions

Threads are akin to multiple lanes on a highway, allowing independent paths for tasks to proceed simultaneously. They are commonly used for multitasking, particularly suitable for handling I/O.bound functions.

Threads are efficient for tasks involving I/O operations but may not fully utilize multi-core processors for ⚙️ CPU-bound functions due to Python’s design limitations.

from threading import Thread

def get_data():
  result = database.query()
  return result

def send_mail():
  result = mailer.send()
  return result

t1 = Thread(target=get_data)
t2 = Thread(target=send_mail)

t1.start()
t2.start()

t1.join()
t2.join()

Processes

Good for ⚙️ CPU-bound functions

Python processes allow functions to run in a separate process, leveraging all available CPU cores. This approach bypasses Python’s limitations and maximizes CPU initialization, making it suitable for CPU-bound tasks

from multiprocessing import Process

def calculate_fibonacci():
  result = fibonacci(100)
  return result

def calculate_pi():
  result = digits_of_pi(100)
  return result

p1 = Process(target=calculate_fibonacci)
p2 = Process(target=calculate_pi)

p1.start()
p2.start()

p1.join()
p2.join()

Async I/O

Good for 🌐 I/O-bound functions

Web applications commonly involve reading/writing to files, databases, and calling external services via HTTP requests. All of that time spent waiting for each call to complete is time wasted not processing other stuff, so here is where async I/O comes in handy to improve the throughput of an application.

Unlike threads, where the Operating System’s scheduler preemptively decides when to execute and interrupt functions, in this model functions cooperate so they’re executed one at a time, but each explicitly yields control to the next one when it has completed its work or when it is waiting for some event to occur, such as I/O completion.

import aioextensions
# ^ Fluid Attacks library with asyncio utils to simplify its usage

async def get_data():
  result = await database.query()
  # ^ This will take a while. Keep going and we'll talk later
  return result

async def send_mail():
  result = await mailer.send()
  return result

# While get_data waits for its query, send_mail will be executed
# It's doing multiple things at the same time 🙌
await aioextensions.collect([
  get_data(),
  send_mail(),
])

So, in this way of doing things, you get the benefits of multitasking without worrying about issues such as thread safety, but it also comes with its challenges.

Challenges

The main challenge of cooperative multitasking is it reliance on cooperation from all functions within the application.

In this model, each function must voluntarily yield control to other functions when it’s not actively processing work. However, if a function fails to yield control when necessary, it can cause delays in processing other requests or even lead to the entire application server becoming unresponsive.

While these issues can be identified and mitigated, they represent inherit risks in this design. In cases where reliability takes precedence over performance requirements, this model may not be the most suitable choice.

import asyncio
import aioextensions

async def get_data():
  print("get_data started")
  await asyncio.sleep(5)
  print("get_data finished")

async def send_mail():
  print("send_mail started")
  await asyncio.sleep(3)
  print("send_mail finished")

aioextensions.run(
  aioextensions.collect([
    get_data(),
    send_mail(),
  ])
)

import asyncio
import aioextensions
import time

async def get_data():
  print("get_data started")
  time.sleep(5)
  # ^ From the good old standard library, what could go wrong?
  print("get_data finished")

async def send_mail():
  print("send_mail started")
  await asyncio.sleep(3)
  print("send_mail finished")

# 😰 Oh no, get_data calls a ⌛️ blocking function
# send_mail will not even be triggered until it finishes!
aioextensions.run(
  aioextensions.collect([
    get_data(),
    send_mail(),
  ])
)

FAQ

Why was asyncio chosen for usage in our components?

Asyncio is considered a good approach for applications with numerous 🌐 I/O-bound functions.

Before 2019, our components used synchronous Python, but the decision was made to embrace asyncio to enable performance improvements.

While some components still find threads and processes more suitable for their use case, asyncio offers advantages for I/O-bound tasks.
What should I keep in mind when working on asyncio applications?
1. Do not use ⌛️ blocking functions
2. You do not use ⌛️ blocking functions
3. Avoid using ⌛️ blocking functions
For real, what are some tips to avoid breaking stuff?
1. Be aware of ⌛️ blocking functions and either look for asyncio-compatible alternatives or wrap calls using in_thread to make them non-blocking.
2. Many functions in Python’s standard library are ⌛️ blocking, as it pre-dates the asyncio way of doing things.
3. If you’re using a third-party library, look for asyncio support in the docs, and if it doesn’t have, consider opening an issue to let the maintainers know.
So, is in_thread as good as native asyncio? Why don’t we just use it everywhere?

Using threads introduces overhead, so it’s advisable to use them only when necessary for specific 🌐 I/O-bound functions known to be ⌛️ blocking.
But, what exactly is a ⌛️ blocking function?

A blocking function is any operation that takes too long before returning or yielding control (using await).

Some commonly used examples include:
- requests
- urllib.request.urlopen
- time.sleep
- subprocess.run
- open (including file.read, file.write, and file.seek).
Couldn’t we just lint it in the CI pipeline?

Linting for blocking functions can be challenging since any function can be considered ⌛️ blocking if it takes long enough.

One approach would be to have a list of functions that are known to be ⌛️ blocking, and break the build if one of them is used in the code. At the time of writing, the closest tool to a linter for this case would be flake8-async, which is likely better than nothing, but falls short in detecting some cases.
What happens if I use in_process to run 🌐 I/O-bound functions?

Using multiple processes for I/O-bound functions incurs unnecessary overhead as using multiple processes only favors ⚙️ CPU-bound functions. Threads are more suitable for I/O-bound tasks and have less overhead.
What happens if I declare a function as async def but never use await inside?
```
async def do_something():
  return "Hello world"
```
The function will still run like a normal function but will have some (usually trivial) overhead as Python generates additional code and treats it as a ‘coroutine’.