What is concurrent programming and how to master it

Last update: 28/02/2026
Author Isaac
  • Concurrent programming models tasks that overlap in time, differing from physical parallelism but being able to take advantage of it.
  • Processes, threads, coroutines, event loops, and actors offer different models for structuring concurrency, each with its own advantages.
  • Problems such as race conditions, deadlocks, or starvation require careful timing mechanisms and planning policies.
  • Formal models such as Petri nets and CSP, along with tools such as FDR or JCSP, allow for the verification and simplification of complex concurrent systems.

concurrent programming

La concurrent programming It has become a key component in modern software development: operating systems, web servers, mobile apps, video games, databases… almost everything we use daily executes multiple tasks simultaneously, or at least it seems that way. A solid understanding of how this programming approach works, how it differs from parallelism, and what problems it entails is essential if you want to write robust and efficient code.

In the following lines we will break down What exactly is concurrency, how is it modeled at the process and thread level, what models exist (threads, coroutines, actors, event loop…)What typical errors arise (race conditions, deadlocks, starvation, false sharing, etc.) and what tools we have to synchronize and verify concurrent systems. You'll see that, beyond the theory, all of this has a very practical application in real-world software design.

What is concurrent programming and how does it differ from parallelism?

When we talk about concurrency, we are referring to the ability of a system to manage multiple tasks whose lifecycles overlapso that their execution steps overlap. The important idea is logical overlap: from the programmer's point of view, there are several "live" activities at the same time, even though physically a single processor can only execute one instruction at a time.

The execution is organized as a interleavingThe scheduler allocates small portions of CPU time to each task. It performs a portion of task A, then moves on to task B, then to task C, and later returns to task A, always respecting data dependencies. Concurrent programming, therefore, is a way of structuring programs into several almost independent activitiesso the exact order in which they are interleaved is not fully fixed.

The parallel is another story: here we're talking about the fact that multiple tasks are executed literally at the same time in different cores or processors (manage CPU coresA system can be concurrent and not parallel (a single core that keeps changing tasks) or concurrent and parallel at the same time (several cores executing some of those threads or concurrent processes in parallel).

In practice, competition seeks modeling problems where multiple things happen at the same time (web requests, user clicks, disk or network I/O, sensors, etc.), while parallelism focuses on accelerating heavy calculations by dividing them into subproblems that run concurrently. The same concurrent design can run in a merely interleaved manner or exploit true parallelism if the hardware has multiple cores.

concurrency and parallelism

Processes, threads, and execution states

In an operating system, the basic concept for supporting concurrency is the proceedingsA process is not just the executable file on disk: it is an active entity with its own execution state. It includes a program counter (a CPU register that points to the next instruction to be executed), a process stack (function parameters, return addresses, local variables…), a data section (global variables, static data) and an area of heap where dynamic memory is housed at runtime.

The binary file stored on disk is just a passive programA program is a sequence of instructions and data. For that program to "come to life," the operating system must create a process, initialize its context (registers, memory, file descriptors, etc.), and schedule it to receive CPU time.

States of a process

While a process exists, it goes through different execution stateswhich are usually at least the following:

  • NewThe system is creating the process and configuring its internal structures.
  • RunningThe process has the CPU and its instructions are being carried out.
  • Blocked or on hold: the process waits for an event to occur (input/output, end of an operation, arrival of a message…).
  • Ready or preparedIt has everything it needs to run, but it's waiting its turn on the CPU.
  • FinishedIts execution has finished and the system will release its resources.

The change between these states is decided by planner of the operating system according to its policy (round robin, priorities, etc.), balancing CPU usage, latency, and fairness between processes.

Process control block (PCB)

Each process is represented in the kernel by a data structure known as PCB (Process Control Block)This block is the "file" of the process and stores all the information necessary to stop and resume it without losing anything.

The PCB stores the process status (new, ready, running, blocked…), the program counter current, the CPU registers (whose number and type depend on the architecture), the planning information (priority, pointers to ready queues, usage statistics), and the data of memory management (page tables, segment tables, etc.). Thanks to this, the operating system can do context changesThat is, to save the complete state of one process and restore that of another.

Threads and multi-threads

In addition to processes, many operating systems provide threads of execution (threads). A thread is a basic unit of CPU usage: it has its own identifier, program counter, registers and stack, but shares with other threads of the same process the code section, the data section and other resources such as file descriptors.

In a multithreaded program, multiple threads can execute different parts of the same code in parallel or interleavedEach process maintains its own stack (with its local variables and call context), but all processes see the same address space. This facilitates communication (by sharing memory) at the cost of introducing concurrency risks such as race conditions.

In a typical multithreaded design, the process has a single PCB and a single address space, whereas Each thread adds its own stack and register contextThe operating system (or the language runtime) implements a preemptive model: it can interrupt a thread at any time and give the CPU to another, forcing you to program thinking that your thread can be paused between two seemingly “atomic” instructions.

Multiprogramming, multiprocessing, and distributed processing

It is useful to distinguish several levels of concurrent execution. multiprogramming It allows multiple processes to reside in memory and take turns using the CPU of a single processor; they never truly run in parallel, but rather in an interleaved manner.

El multiprocessing (or multiprocessing) uses two or more processors or cores on the same machine to run one or more processes, achieving true parallelism. Each core can handle a different thread simultaneously, multiplying performance across many workloads (see the Key new features from Intel and AMD (regarding CPUs).

El distributed processing It goes a step further: one or more processes are executed in different computers connected by a networkAdditional problems come into play here (latencies, partial failures, network partitioning), but the conceptual model remains concurrent: many ongoing activities that must be coordinated.

Concurrent programming models: threads, coroutines, event loops, and actors

concurrent programming models

Concurrency is not always programmed the same way. There are several. concurrent programming models These offer different abstractions and compensate in different ways for ease of use, performance, and parallelism. Among the most common are system threads, coroutines, event loops, and the actor model.

  How to Fix Microsoft Photos App Error on Windows 10

Coroutines

A coroutine is a form of cooperative concurrency within the same operating system threadUnlike traditional threads, where the OS scheduler decides when to interrupt and resume, coroutines explicitly relinquish control when the scheduler indicates (e.g., when waiting for I/O or when completing a part of the task).

This model is called cooperative Because each coroutine must be designed to "behave well": it must relinquish execution at appropriate points so as not to block the others. In return, the context of a coroutine is much lighter than that of a thread: many coroutines can live within a single system thread, sharing the stack or using very small stacks.

By default, coroutines They model concurrency without parallelismAll of these are interleaved within a single thread. However, many modern runtimes allow combining them with multiple threads to achieve true parallelism if needed. Concepts like promises/futures or async/await are often naturally integrated with this model.

Event loop

Another very influential pattern is that of event loopIn this approach, there are generally a single thread that maintains a central cycle where events are handled (data arrival via network, timers, user inputs…) and previously registered callbacks are executed to react to each type of event.

It is referred to as a model delayed blocker Because, instead of blocking while waiting for I/O, the operations register a handler that will be executed when the event occurs at the end of a loop cycle. This allows multiple concurrent activities to be described on a single thread, avoiding the complexity of inter-thread synchronization and providing very efficient CPU usage.

By default, this model provides concurrence but not parallelism (everything happens on the same thread), although many platforms allow scaling to multiple threads or processes under the hood. A key aspect is that the local memory and the range of variables behaves very similarly to a sequential program, which reduces the risk of race conditions.

They are frequently used in the design of event loop-based APIs closures or callbacks to capture the necessary environment. This can lead to the well-known "pyramid of doom" problem, where nested callbacks make the code difficult to read. Alternatives such as promises/futures or the keywords async/await They greatly improve ergonomics.

Actor model

In the actor model, the basic unit of execution is the actorAn actor is an entity that encapsulates its own state and behavior and communicates exclusively by sending messages. Each actor has a message queue and processes these messages sequentially, one by one, thus avoiding directly sharing mutable memory.

Actors are, conceptually, extremely lightweight processes Managed by a runtime or virtual machine, not directly by the operating system. This allows the creation of thousands or even millions of actors within a real process, each with its own message box, without the cost of an OS thread per actor.

Execution is usually preemptive: the virtual machine decides when to interrupt one actor to allow another to run, based on global criteria. If the environment has multiple cores, the runtime may distribute actors among threads or cores in a relatively transparent manner, achieving effective parallelism.

An important consequence of this model is that the actors they do not share mutable memoryAll communication occurs by passing messages (often immutable). This favors functional purity: an actor only "sees" its own state and the data it receives, which facilitates reasoning about behavior and exploiting safe parallelism. Furthermore, a typical approach is to optimistic error handlingIf an actor fails due to a transient condition, it restarts without bringing down the entire system.

Which parts of a program can run concurrently

Not all code can be happily launched in parallel or concurrently. There are fragments where the order of execution is critical (for example, initializing a variable before using it), and others where the order is irrelevant (two independent calculations on different data).

To decide whether two blocks of instructions can be executed concurrently, the following are used: Bernstein's conditionsFor each instruction set Sk We define:

  • L(Sk): set of variables read during the execution of Sk.
  • E(Sk): set of variables written (updated) during the execution of Sk.

Two S blocksi y Sj They can be executed concurrently if three conditions are met: E(Si) ∩ L(Sj) = ∅, L(Si) ∩ E(Sj) = ∅ y E(Si) ∩ E(Sj) = ∅In other words, neither of them should write to a variable that the other reads or writes. If either of these conditions is violated, executing both blocks simultaneously can change the program's outcome.

Typical problems of concurrent programs

Designing concurrent programs is complicated because the The exact order in which the actions are performed is not fully determined.This non-determinism leads to subtle errors, sometimes difficult to reproduce, which only manifest themselves under certain instruction combinations. Let's look at the most common ones.

Race conditions and critical section

An race condition This occurs when the result of a computation depends on the relative order in which two or more threads access shared resources, and that order is not under control. If multiple threads modify the same variable without coordination, the final value may be incorrect.

A classic example is the operation x = x + 1 on a shared variable. In a sequential program it's trivial, but in a multithreaded environment each increment consists of several instructions: read x, add 1, write the result. If two threads interleave these instructions without mutual exclusionBoth can read the same initial value and overwrite each other, losing increments.

In Java, a counter incremented by two threads half a million times each should eventually reach one million, but in the absence of synchronization, different results are observed in each execution. The reason is that There is no total order of operationsbut a partial order dependent on the planner and the specific intercalation.

To avoid these situations, the following are identified: critical sectionsCritical sections are portions of code that access shared resources and should not be executed by more than one thread at a time. Synchronization mechanisms (locks, semaphores, monitors, etc.) ensure that only one thread enters the critical section, just as a traffic light controls access to a narrow intersection.

Violation of mutual exclusion

We speak of a violation of mutual exclusion when two or more threads manage to enter the same critical section simultaneouslyThis breaks the guarantee of exclusive access to the shared resource and leads to undesirable results, such as the counter above never reaching the expected value.

This type of failure is usually due to Incomplete or incorrect synchronization: forgetting to protect access with a lock, using non-atomic instructions, or combining several resources without a clear lock acquisition policy.

Deadlock or interlock

Un deadlock Deadlock (or "deadlock") occurs when one or more processes are left waiting indefinitely for an event that will never happen. Typically, each process in a cycle waits for a resource that the next process has, creating a circular dependency with no way out.

For deadlock to exist in a resource system, four simultaneous conditions are necessary:

  • Mutual exclusion: resources are allocated exclusively (not shared).
  • Retention and waitingThe processes maintain resources they already possess while requesting additional ones.
  • No expropriationResources cannot be forcibly withdrawn from a process; they are only released voluntarily.
  • Circular waitThere is a circular chain of processes where each one waits for a resource held by the next one.
  What is an MBOX file? What is it for and how to open one

The techniques for avoiding or breaking deadlocks are based on deny at least one of these conditionsFor example, by globally ordering resources to avoid cycles, allowing certain forms of expropriation, or using detection and recovery algorithms. Dijkstra's banker's algorithm is a classic, as is his example of the "dining philosophers."

Indefinite postponement and injustice

El indefinite postponement (Starvation) occurs when a process ready to run or obtain a resource is delayed indefinitely Because the scheduling policy never gives it a turn. It's not blocked by a deadlock; there are simply always other processes that get ahead of it.

This phenomenon is closely linked to the injustice (unfairness) in scheduling algorithms: if the system systematically favors certain processes (by priority, by order of arrival, etc.) without compensating for the accumulated waiting time of the others, it is easy for some to be left "without candles" forever.

Typical solutions involve implementing aging mechanisms (increase the priority of those who have been waiting a long time) or treat processes strictly in waiting order. At the design level, it's a guideline: when writing concurrent code, you have to consider whether the system guarantees that all threads that can progress will eventually do so.

Busy waiting

Another common trap is the busyA process enters a loop that constantly checks a condition (for example, the value of a shared variable) and does nothing useful in the meantime. While not functionally incorrect, it wastes CPU and can prevent other processes from receiving enough time.

Ideally, when a process cannot move forward because it is waiting for an event, suspend and release the CPU until the event occurs. This is achieved through blocking primitives that put the thread to sleep (wait, sleep, semaphores, monitor conditions, etc.), instead of letting it spin in a loop.

False sharing

In modern architectures with caching, the problem of false sharingThis occurs when multiple threads modify different variables located in the same cache lineAlthough conceptually they are independent data, the cache invalidates and synchronizes the entire line every time one of them writes, generating a storm of invalidations that degrades performance.

False sharing does not break the program's correctness, but it causes a significant drop in performanceespecially in intensive loops. To mitigate this, techniques such as the variable alignment in memory, the use of padding between structure fields or compiler and runtime options to separate data used by different threads into different cache lines.

Safety and life properties in concurrent programs

To reason about whether a concurrent program is "well made", two main families of properties are usually distinguished: to maximise security and your enjoyment. y Life (liveliness). Both are fundamental to defining the correct behavior.

Safety features

The safety properties indicate what the program should never doThey are usually formulated as invariants: conditions that must always remain true throughout the execution.

  • Mutual exclusionThere should never be more than one process in a critical section at a time.
  • Absence of deadlockNo process should be blocked waiting for an event that will never happen.
  • Partial CorrectionIf the program terminates, the output obtained must be one of those allowed by the specification.

These properties are usually verified through formal reasoning, systematic testing, or automated verification tools on simplified models of the system.

Properties of life

On the other hand, the properties of life (liveness) capture what the program should eventually doThat is, "something good" will eventually happen if there are no external failures.

  • JusticeEvery process that is capable of being executed will eventually have its opportunity.
  • Reliable communicationEvery message sent ends up being received by the recipient (or at least handled according to the error protocol).
  • Completely correctIf the program terminates, it does so with the correct result and does not get stuck in the middle of execution.

Ensuring lifetime properties is often more difficult, as they depend on both the design of the concurrent algorithm and the scheduling policies of the underlying system.

Classic problems of concurrence: readers-writers, producer-consumer, and philosophers

Concurrency theory often relies on canonical problems which serve as a testing ground for synchronization techniques. Several of them have become true classics.

Reader-writer problem

In systems such as databases, shared files, or in-memory structures, it is common to find multiple readers and multiple writersReaders only read; writers modify. As long as no one writes, readers could access the content in parallel without any problem, but A writer cannot be allowed to modify data while others are reading or writing.to avoid corruption or inconsistent readings.

This problem requires designing algorithms that grant concurrent read access when there are no ongoing writing projects, and exclusive access for writers when they need to update. Depending on the solution, readers or writers may be unfairly favored, which necessitates protecting living space as well.

Producer-consumer problem

Another classic scenario is that of producers and consumers that share a buffer (a "storehouse" of items). Producers generate items and deposit them into the buffer; consumers extract those items for processing. The challenge is to coordinate both sides so that producers don't overflow the buffer and consumers should not try to draw from an empty buffer.

A simple solution is to protect the buffer with mutual exclusion (e.g., binary semaphores) and use counters to block producers when the buffer is full Consumers are already there when it's empty. This approach, while valid, can become cumbersome when there are many players or the flow becomes complicated.

A more structured alternative is to encapsulate the buffer and its synchronization logic in a monitor: an object that offers atomic operations (insert, extract) and internally manages waiting queues, avoiding duplication of synchronization code in producers and consumers.

The Philosophers' Dinner

The problem of dinner of the philosophers This perfectly illustrates deadlocks, race conditions, and starvation. Several philosophers sitting around a table need two forks to eat, but there is only one fork between each pair. If they all try to grab the forks at the same time, it's possible that each one will grab a different fork, and they will all be stuck waiting for the second one: a textbook deadlock.

If the algorithm is not designed well, it can also happen that The same philosopher always goes hungry while others manage to hoard forks repeatedly, exemplifying injustice and starvation.

The proposed solutions range from a round-robin simple (a system of turns where they eat one at a time), going through queues of thinkers where whoever doesn't get the second fork goes to the back of the queue, until introducing a referee which limits the number of philosophers who can sit at the table simultaneously (for example, a maximum of n−1 if there are n forks), so that there will always be at least one philosopher who can eat and free up resources.

Synchronization and communication methods between processes and threads

To maintain control over access to critical areas and coordinate the execution of concurrent tasks, we have several synchronization and communication primitivesThe most commonly used in practice are semaphores, locks, monitors, and explicit message passing.

Traffic lights

A semaphore is a protected integer variable that can only be accessed with two atomic operations: wait/down (decrement and block if the value is zero) and signal/up (increment and, optionally, wake up a blocked process). The operating system guarantees that these operations are indivisible, so they serve to implement mutual exclusion and other synchronization patterns.

  Strategic tutorial: which software to choose to digitize your company

Un binary semaphore (Only 0 or 1) functions as a lock; a counting semaphore allows a configurable maximum number of threads in a given section. Although very flexible, semaphores are considered low level of abstractionWhen misused, they are an inexhaustible source of bugs (deadlocks, signal voids, etc.) and force the mixing of synchronization logic with domain logic.

Mutual exclusion through locks

La mutual exclusion It is usually implemented using locks (mutual exclusion locks or mutexes). A process that wants to enter a critical section tries to acquire the lock; if it is free, it enters and marks it as occupied, otherwise, it is blocked until the lock is released.

Conceptually, it's like a lock on the resourceOnly the person with the key can enter the critical section. This pattern is very common in practice, but it requires discipline: the lock must always be released, the same acquisition order must be used when there are several, and excessively long critical sections that degrade parallelism must be avoided.

Monitors

Monitors take mutual exclusion to a higher level. A monitor is a abstraction that groups shared data and the operations that can be performed on themensuring that these operations are executed one by one, never in parallel within the monitor.

On a monitor, when a thread calls one of its methods, automatically enters a protected regionIf another thread is already in progress, it must wait its turn. Additionally, monitors typically offer condition variables (wait/notify, wait/signal) to suspend and wake threads based on certain internal state conditions, simplifying the resolution of problems such as producer-consumer.

Because this technique has a high level of abstractionThis approach is safer and less prone to errors than directly using semaphores or locks scattered throughout the code. Languages ​​like Java, C#, and various modern runtimes provide native support for monitors and conditions.

Communication through message passing

Another family of techniques dispenses with the shared state and is based on exchange messages between processes or threads. Each message usually has a header (sender and receiver identifiers, message type, size) and a body with the necessary data.

There are several addressing modes: in the direct routingIn the sender, the sender explicitly indicates who the receiver is, and both know each other. In the implicit addressingThe sender specifies the receiver, but the receiver doesn't necessarily know who is sending the message. Finally, in the indirect routing Messages are deposited into mailboxes managed by specific processes or by the operating system; interested processes read from the mailbox to find out if they can access a certain critical section or process a certain event.

Synchronization can be blocker (the sender waits for the receiver to receive, or the receiver waits for a message) or non-blocking. Choosing which combinations to use is crucial: a bad choice can create new race conditions or deadlocks instead of resolving them.

Process planning and context changes

La process planning It is the strategy by which the operating system decides which process or thread runs at any given time, sharing the CPU among all those in memory. It is based on managing queues of ready processes and applying scheduling algorithms with objectives such as maximizing CPU utilization and minimizing delays (see the new features of the new kernel).

There are usually three levels of scheduling:

  • Short term: decides which ready process gets the CPU next. Directly controls the immediate allocation of processor time.
  • Medium termThe CPU scheduler, also called the CPU scheduler or dispatcher, decides which processes should remain in main memory and which can be swapped to optimize overall performance. dispatch latency It is the time it takes the system to stop one process and start another.
  • Long term: regulates which processes are accepted into the system and which are removed from memory, affecting the global multiprogramming level (how many processes are active at the same time).

Classical scheduling algorithms include FCFS (First-Come, First-Served), SJF (Shortest-Job-First), planning by priorities, Round Robin y multilevel queuesEach one offers different compromises in terms of response time, throughput, fairness, and complexity.

To switch between processes, it is necessary to perform context changesThe system saves the contents of the registers, the stack pointer, and the program counter of the outgoing process to its PCB, and restores those of the incoming process. This mechanism allows a process to resume exactly where it left off, as if nothing had happened in between.

Concurrent systems verification tools

Since the behavior of a concurrent system can vary greatly depending on the order of events, it is common to resort to formal models and verification tools that help detect deadlocks, race conditions, and other violations of desired properties before deploying the actual system.

Petri nets

The Petri nets They are a graphical model to describe concurrent systems. They are represented as a directed and bipartite graph with two types of vertices: places (conditions) and transitions (events). The places contain tokens that indicate what conditions are met in a given state.

A transition is Enabled if all its entry locations have at least one token. When a transition is triggered, it consumes one token from each entry location and adds tokens to its exit locations. Starting from an initial marking (token distribution), it is possible to study properties such as whether the system is live (every transition can eventually be triggered) or bounded/safe (tokens do not accumulate infinitely in one place).

CSP (Communicating Sequential Processes)

CSP It is a mathematical theory for specifying and reasoning about systems composed of sequential processes that communicate through channels. Its formal semantics allows for the description of interaction patterns and the detection of... interlocks (deadlocks) or livelocks and verify that a design meets certain safety and life properties.

In CSP, processes are connected through one-way channels and they synchronize using a rendezvous mechanism: writing and reading a message on a channel are considered joint actions that occur simultaneously, avoiding certain forms of race conditions associated with traditional shared state.

FDR and JCSP

FDR (Failures-Divergence Refinement) It is a CSP-based tool that allows you to automatically check if a system model meets certain properties, also expressed as CSP processes. The idea is to verify if the real system is a refinement (does not introduce new faults or divergences) from the desired abstract model, detecting, for example, security violations or possible deadlocks.

For its part, JCSP It is a Java library that directly implements CSP concepts: processes, channels, synchronization, etc. It simplifies writing concurrent Java programs without resorting to locks and direct memory sharing, relying on secure and well-defined communication primitives such as channels, barriers, timers and other high-level components.

Taken together, this entire ecosystem of models, classic problems, timing primitives, and verification tools provides a robust framework for designing and analyzing Reliable, efficient, and scalable concurrent programs, capable of taking advantage of both task interleaving and the parallelism available in modern hardware.

linux 6.19
Related article:
Linux 6.19, all the new features and improvements of the new kernel