The dangers of concurrency
A dragon and a tiger Dragon model adapted from Chuya Miyamoto's Wyvern. Tiger model from Gen Hagiwara. Both models folded by me. Copyright 2023 quantumboar.net
Alan Cox once said, 'A computer is a state machine. Threads are for people who can't program state machines'. A few years back I was tasked with optimizing some buffer handling code used for video capture (and a variety of other multimedia related tasks) to allow high-fps captures, moving from what was then standard 30 fps to up to 120 fps or more, depending only on sensor capabilities. To satisfy the requirement I removed all locks, achieving over a fourfold increase in capture frames per second simply by eliminating thread contention. And the implementation could still run in a fully thread-safe manner using less power per frame.
A caveat goes with most of the other considerations, here and below: each job needs the right tools. You might not want to use a sports car in place of a tractor to plow a field, albeit they might both have the same horse power. So goes for a lock-less design where a locking one might be more convenient, for the sake of maintainability and scalability, where resources allow for it. Conversely it could be more appealing to use a cooperative approach versus a preemptive one where low resource usage, minimal latency and predictable execution times are requirements, at the cost of greater complexity.
Concurrency constructs
The natural question that arises after an experience like my my lock-less story is whether locks are essential for concurrency and, stretching our curiosity a bit, whether the same holds true for threads. The best answer I can think of is probably not for both, or, more specifically, it depends on requirements and hw/sw architecture
Preemptive scheduling, threads and locks
On most modern systems, processes and threads make it possible to share a limited amount of CPU cores across a possibly much larger number of independent tasks. This time sharing is often achieved by allowing the scheduler, a low level sw component, to execute and interrupt said tasks for specific times according to specific policies. Embedded systems usually don't need (or can't afford) process separation, and so their tasks are mostly mapped into threads.
But threads (and even more so processes) come at a cost. In terms of memory, primarily for the stack and library-defined variables (think errno), in terms of CPU usage for the costs associated to operations like context switches, and in terms of latency and jitter or, better to say, predictability introduced each time a preemptive scheduler halts a task's execution.
When threads need to access shared resources, or a synchronization is necessary, locks come into play. Other than an additional cost on memory (each lock comes with associated structures) locks might introduce relevant latency when collisions occur. Even if no collisions happen, frequent lock handling can use significant CPU time, especially on hw without native lock support. Are locks simpler to maintain? if you've ever written a multi-threaded app, then you're likely to know how insidious it could be to avoid deadlocks and priority inversions. Furthermore, inside a function call traversing a critical section, how can you be sure that the calling thread is holding the right locks without additional complexity like re-entrant locks?
Cooperative scheduling and finite state machines
On multi core systems, threads or processes (depending on the OS) are probably necessary to allow parallel execution. But as you'll still pay the memory cost, latencies would be much reduced in terms of scheduler cost if just one thread was allowed to run on each core with no locks.