Multitasking
Introduction
It’s all about doing multiple things at the same time.
Multitasking involves handling multiple task simultaneously. By understanding multitasking, you can optimize software to leverage available hardware efficiently.
In web servers
Traditionally, a server handles one request at a time, leaving others waiting. To scale, servers spawn multiple copies, each still handling only one request at a time.
In the early 2000s, as web traffic surged, engineers encountered the C10K problem problem, requiring solution to handle 10000 concurrent requests efficiently. They explored two options:
- Asynchronous I/O: utilize single-threaded asynchronous I/O, relying on Operating System support to trigger I/O operations, and notify later once completed, allowing serving others simultaneously.
- Multi-threading: Serve one client per thread, but with increased resource consumption, as each thread allocates a portion of memory for its stack, and spends some CPU cycles in context switching, which in the hardware of that era was a matter of big concern.
Projects like NGINX, Node.js, and Twisted emerged, implementing the asynchronous I/O approach.
Options
Let’s explore multitasking options in Python.
Threads
Good for 🌐 I/O-bound functions
Threads are akin to multiple lanes on a highway, allowing independent paths for tasks to proceed simultaneously. They are commonly used for multitasking, particularly suitable for handling I/O.bound functions.
Threads are efficient for tasks involving I/O operations but may not fully utilize multi-core processors for ⚙️ CPU-bound functions due to Python’s design limitations.
Processes
Good for ⚙️ CPU-bound functions
Python processes allow functions to run in a separate process, leveraging all available CPU cores. This approach bypasses Python’s limitations and maximizes CPU initialization, making it suitable for CPU-bound tasks
Async I/O
Good for 🌐 I/O-bound functions
Web applications commonly involve reading/writing to files, databases, and calling external services via HTTP requests. All of that time spent waiting for each call to complete is time wasted not processing other stuff, so here is where async I/O comes in handy to improve the throughput of an application.
Unlike threads, where the Operating System’s scheduler preemptively decides when to execute and interrupt functions, in this model functions cooperate so they’re executed one at a time, but each explicitly yields control to the next one when it has completed its work or when it is waiting for some event to occur, such as I/O completion.
So, in this way of doing things, you get the benefits of multitasking without worrying about issues such as thread safety, but it also comes with its challenges.
Challenges
The main challenge of cooperative multitasking is it reliance on cooperation from all functions within the application.
In this model, each function must voluntarily yield control to other functions when it’s not actively processing work. However, if a function fails to yield control when necessary, it can cause delays in processing other requests or even lead to the entire application server becoming unresponsive.
While these issues can be identified and mitigated, they represent inherit risks in this design. In cases where reliability takes precedence over performance requirements, this model may not be the most suitable choice.
FAQ
-
Why was asyncio chosen for usage in our components?
Asyncio is considered a good approach for applications with numerous 🌐 I/O-bound functions.
Before 2019, our components used synchronous Python, but the decision was made to embrace asyncio to enable performance improvements.
While some components still find threads and processes more suitable for their use case, asyncio offers advantages for I/O-bound tasks.
-
What should I keep in mind when working on asyncio applications?
- Do not use ⌛️ blocking functions
- You do not use ⌛️ blocking functions
- Avoid using ⌛️ blocking functions
-
For real, what are some tips to avoid breaking stuff?
- Be aware of ⌛️ blocking functions
and either look for asyncio-compatible alternatives
or wrap calls using
in_thread
to make them non-blocking. - Many functions in Python’s standard library are ⌛️ blocking, as it pre-dates the asyncio way of doing things.
- If you’re using a third-party library,
look for
asyncio
support in the docs, and if it doesn’t have, consider opening an issue to let the maintainers know.
- Be aware of ⌛️ blocking functions
and either look for asyncio-compatible alternatives
or wrap calls using
-
So, is
in_thread
as good as native asyncio? Why don’t we just use it everywhere?Using threads introduces overhead, so it’s advisable to use them only when necessary for specific 🌐 I/O-bound functions known to be ⌛️ blocking.
-
But, what exactly is a ⌛️ blocking function?
A blocking function is any operation that takes too long before returning or yielding control (using
await
).Some commonly used examples include:
requests
urllib.request.urlopen
time.sleep
subprocess.run
open
(includingfile.read
,file.write
, andfile.seek
).
-
Couldn’t we just lint it in the CI pipeline?
Linting for blocking functions can be challenging since any function can be considered ⌛️ blocking if it takes long enough.
One approach would be to have a list of functions that are known to be ⌛️ blocking, and break the build if one of them is used in the code. At the time of writing, the closest tool to a linter for this case would be flake8-async, which is likely better than nothing, but falls short in detecting some cases.
-
What happens if I use
in_process
to run 🌐 I/O-bound functions?Using multiple processes for I/O-bound functions incurs unnecessary overhead as using multiple processes only favors ⚙️ CPU-bound functions. Threads are more suitable for I/O-bound tasks and have less overhead.
-
What happens if I declare a function as
async def
but never use await inside?The function will still run like a normal function but will have some (usually trivial) overhead as Python generates additional code and treats it as a ‘coroutine’.