Hacker's Handbook


Designing Concurrent Systems on the BEAM

Principles and Strategies for Robust System Design

Posted: 2024-03-09

Designing Concurrent systems on the BEAM

'Simplicity is the ultimate sophistication.' — Leonardo da Vinci

Introduction

Concurrency is a necessity when developing larger software systems. It reflects our world's inherent complexity, where multiple events occur simultaneously, demanding systems that can multitask efficiently.

The BEAM's concurrency model, grounded in the actor model, prioritizes lightweight processes. These processes operate in isolation, without shared memory, communicating solely through message passing. This architecture minimizes the risk of system-wide failures due to process errors. The BEAM's preemptive scheduling also ensures fair process execution time, preventing any single process from dominating the system. This model makes building efficient, reliable, scalable, and maintainable systems much more effortless.

Understanding BEAM's approach to concurrency allows developers to take advantage of these features in the best possible way. In this post, I will not go into the details of how the BEAM works; I have written a whole book on the subject: The Beam Book. Instead, I will focus on how to think about processes and concurrency when designing concurrent systems. I will recap the most important aspects of the BEAM’s concurrency model so that we understand what we are building on.

BEAM’s Concurrency Model

BEAM's concurrency model uses the actor model. This model has the following defining characteristics: lightweight processes, process isolation, signals (including message passing), and scheduling.

BEAM handles concurrency through lightweight processes managed by the BEAM VM rather than the underlying operating system. These extremely lightweight processes allow thousands or even millions of concurrent processes without significant overhead. Each process in the BEAM runs in complete isolation from others, with no shared memory. Isolation ensures that failure in one process does not directly impact another, enhancing fault tolerance and system reliability.

Communication between processes in the BEAM is achieved exclusively through signals. This mechanism ensures the decoupling of processes, as they do not share state directly. The most used signal is the message signal, which allows one process to send a message to another asynchronously. The BEAM uses preemptive scheduling that allocates execution time to processes. Preemptive scheduling prevents any single process from monopolizing the system and ensures that all processes are serviced appropriately.

These features and the built-in constructs for error handling make it easy to build fault-tolerant and resilient systems, for example, through supervisors and supervision trees.

The lack of true process isolation in most programming languages and environments is the root cause of the rising popularity of microservices.

Processes ≠ Code

A common misconception in designing systems with BEAM is equating processes with modules, gen_servers, or the specific code that is spawned. This view muddles the understanding of the architecture and misses the depth of BEAM's concurrency model. Let's dissect this notion with a clear, logical approach to appreciate the distinction between processes and the code they execute.

A process in the BEAM environment is an independent entity capable of executing code. However, it's crucial to understand that the process is not the code. The process is the executor that brings code to life. A process can run any code assigned to it, whether it's a simple function or a complex gen_server. This versatility means that processes are not inherently tied to the nature or purpose of the code they execute. Understanding that processes are separate from the code they execute has significant implications for system design. Since processes are independent executors, they isolate code execution, preventing failures in one process from directly affecting others. It becomes easier to manage concurrency as each process is a distinct execution unit. Developers can spawn, monitor, and control processes dynamically, adapting to the system's concurrent demands without being bogged down by the specifics of the code within each process.

A developer needs to understand that when he looks at the code in a module, this code can be executed by any process. Even if a module has a start function that spawns a process to execute, for example, a server loop in the module, it does not mean that the module is that server. Some other process executes the code that does the spawning, so the server process does not execute the start function. On the other hand, the server loop will probably contain several calls to functions in other modules. Hence, the server process is not limited to executing the code in the module either. Now that we have clearly distinguished between code and processes and realized that the code is not the process, how should we think about processes?

Visualizing Processes as Gnomes

When designing concurrent systems on the BEAM, it can be helpful to personify processes to understand their roles and interactions better. One such visualization is to think of processes as gnomes or workers in a complex, bustling workshop. Each gnome has its tasks, knowledge (state), and means of communication, working independently yet contributing to the workshop's overall goals. This metaphor offers a tangible way to grasp BEAM's abstract concepts of process management, state, and communication.

Imagine each gnome as an independent worker with a specific job. In the context of BEAM, these jobs are described in the code processes execute. Like gnomes, processes operate independently, holding their state in their "heads." This state is private and unique to each process, mirroring how each gnome knows only what is in its head.

Communication among gnomes is akin to asynchronous message passing in BEAM. They "talk" by sending messages to each other without expecting an immediate response, allowing them to continue their tasks without waiting. This method of communication increases the level of concurrency and efficiency in the system, ensuring that no single gnome or process becomes a bottleneck.

Each gnome reads the instructions (code) and performs the described task. This step-by-step execution highlights the process's role as an executor of code, adhering to the earlier principle that processes are not the code itself but rather the entities that bring the code to life. Reading and executing instructions emphasizes the dynamic nature of process-based execution, where each process can handle diverse tasks depending on the code. Just as gnomes might sometimes encounter difficulties or confusion in their tasks, processes in BEAM can also run into issues that hinder their execution. Here, the concept of supervision comes into play. Supervisors are akin to wise, overseeing gnomes who ensure that if any worker encounters a problem, the issue is addressed promptly—restarting the task, reassigning it, or taking corrective measures. This supervision mechanism is essential for building resilient systems that can recover from failures and continue operating smoothly.

Visualizing processes as gnomes or workers enriches our understanding of concurrent systems on BEAM, making abstract concepts more relatable and easier to grasp. This metaphor helps developers and system architects envision architecture as a lively ecosystem of independent yet interconnected entities, each with its role, state, and means of communication. It underscores the importance of designing systems that are efficient, scalable, and robust enough to handle failures gracefully, ensuring uninterrupted operations.

By embracing this visualization, we can approach system design with a clearer, more tangible perspective, fostering creativity and innovation in building and managing concurrent systems on the BEAM. It's a reminder that at the heart of every complex system, there are simple, fundamental principles guiding its operation—principles that, when understood and applied effectively, can lead to the creation of truly exceptional software. Think About Tasks when Dividing Responsibility

A fundamental aspect of designing concurrent systems on BEAM involves defining tasks and assigning responsibilities to processes. A clear, logical approach to task allocation enhances the system's efficiency and reliability. This section outlines a methodology for deciding process responsibility by focusing on task completion from start to finish, illustrating with examples for clarity.

In BEAM, each process should be responsible for a specific task, ensuring it can be executed from inception to completion. This principle of dedicated responsibility simplifies system design, making it easier to debug, scale, and manage. It aligns with encapsulation, where each process, like a microservice, independently manages its state and behavior.

Let's consider a real-world example to illustrate task allocation: a web application that handles user registration, data processing, and notifications. In this scenario:

  • User Registration Process: This process handles everything related to registering a new user – from receiving the registration request to validating the data and storing the user's information in the database. Data Processing Process: Once a user is registered, a separate process might handle data processing tasks, such as analyzing user data for insights or preparing the data for other parts of the system.
  • Notification Process: A distinct process could manage sending notifications to users, whether a welcome email post-registration or alerts based on user activity.

For larger, more complex tasks, dividing the task into subtasks executed by other processes ensures manageability and scalability. Consider a process responsible for handling a high-volume data analysis task. This process can delegate specific analytical tasks to subprocesses:

  • Main Data Analysis Process: Coordinates the overall task, receiving the initial request and returning the final report. Subprocesses for Analysis: Performs detailed analysis of parts of the data.
  • Subprocess for Report generation: Gathers results from all analyses and prepares a comprehensive report.

Each subprocess is responsible for its task from start to finish, ensuring clear boundaries and simplifying the development and troubleshooting processes.

The key is to ensure that one process is ultimately responsible for a task from start to finish. This does not preclude it from delegating parts of the task to other processes. It also implies that it coordinates the overall task, including initiating subprocesses and compiling their outcomes into a final result. This approach maximizes the benefits of BEAM's concurrency model, allowing for efficient parallel processing while maintaining order and accountability.

Thinking of task allocation and process responsibility has several advantages. The system architecture becomes easier to understand and maintain by assigning clear responsibilities. Isolating tasks to specific processes improves the system's ability to recover from errors, as failures are contained within individual processes. It becomes easier to scale the system by adding more processes to handle the increased load or by optimizing individual processes for performance.

Structuring Systems by Thinking About Flow

In concurrent system design, especially within the BEAM environment, an alternative approach to defining process responsibilities is conceptualizing the system in terms of flow. This method emphasizes the movement and transformation of data or tasks through the system, offering a dynamic perspective that complements the task-centric view. By focusing on flow, designers can identify natural divisions within the system, leading to a more intuitive distribution of processes that align with the system's operational logic.

Flow refers to the sequence and interaction of operations within a system to achieve a specific outcome. It encompasses the path data takes from input to output, including all intermediate processing steps. Visualizing a system’s flow helps identify key points where processes can be introduced to handle specific workflow segments efficiently.

Mapping out the flow shows how data or tasks move through the system, highlighting dependencies and potential bottlenecks. By examining the flow, developers can pinpoint logical points to introduce processes. These points often correspond to changes in data state, decision branches, or integration with external systems. Structuring systems around flow makes it easier to scale or modify parts of the system independently, as the impact on the overall workflow is clearer. Understanding the flow aids in tracing issues within the system, as it maps the path of data or tasks through various processes.

To effectively implement a flow-based approach in BEAM systems, consider the following strategies:

  • Event-Driven Processes: Design processes that are triggered by specific events in the flow, ensuring that they are reactive and aligned with the system’s operational dynamics.
  • Pipeline Architecture: Construct a pipeline where each process is a stage in the flow, receiving input, performing its operation, and passing the output to the next stage. This model is particularly effective for data processing and transformation tasks. State Management: For complex state management flows, consider using processes to encapsulate stateful operations, ensuring that state changes are localized and manageable.
  • Flow Control Processes: Implement processes dedicated to controlling the flow, such as routing, load balancing, and error handling, to maintain smooth operation across the system.

Imagine a web application that processes user requests. The flow begins with receiving the request, validating it, processing it (e.g., querying a database), and finally responding to the user.

  • Request Receiver Process: Handles incoming requests, acting as the entry and exit point of the flow.
  • Validator Process: Checks the validity of the request before further processing.
  • Data Processor Process: Interacts with the database or performs the core logic based on the validated request.
  • Response Process: Compiles the response and returns it to the receiver process.

Each process represents a distinct stage in the flow, with clear responsibilities and interactions defined by the sequence of operations handling a web request.

Thinking about flow offers a complementary perspective to task-based process division in BEAM systems, focusing on how data and tasks move and transform. This approach facilitates a clear, logical structure for concurrent systems, enhancing understandability, scalability, and maintainability. By aligning processes with the natural flow of operations, designers can create efficient, responsive systems that effectively leverage the concurrent capabilities of the BEAM environment.

Process Archetypes in BEAM Systems

When constructing a BEAM system, the design of processes can benefit from categorizing them into specific archetypes. These archetypes assist in organizing the processes according to their purpose and behavior and facilitate a more robust and maintainable system structure. Here are examples of process archetypes within BEAM systems:

Workers

  • Pool Workers: These are processes designed to handle a queue of tasks, typically managed by a pool supervisor. They are ideal for tasks that can be executed in parallel, maximizing resource utilization.
  • Short-lived Workers: These workers are used for one-off, transient tasks that do not require maintaining a state after completing their job. They are often used for infrequent or minor tasks and do not justify the overhead of a pool of long-lived processes. To avoid garbage collection, spawn up a worker process with a large enough heap for the task and let it die when it is done, immediately reclaiming all the memory.
  • Servers: Long-lived workers like gen_servers are designed to handle ongoing tasks and maintain state over time. They're the backbone of many systems, managing consistent state and providing services to other parts of the system.

Flow Control

  • Synchronize / Lock: Processes that ensure only one worker can access a resource at a time, preventing race conditions. Serialize / Keep (priority) order: These are designed to order processes or tasks, which is critical in systems where the sequence of operations matters.
  • Rate Limiter / Circuit Breaker: Processes that control the flow of tasks to prevent system overloads. They act as safeguards, limiting the traffic rate or shutting down parts of the system if they become unresponsive.

Data Flow

  • Keepers of State: Processes that maintain and manage state information, critical for both short-term and long-term state management.
  • Resource Owner: Processes that exclusively own and manage specific resources, such as files or network connections.
  • Connections, Listeners, and Monitors: Processes that handle incoming traffic, listen for requests, or monitor resources for changes.
  • Forwarders, Routers, and Broadcasters: Processes that move data through the system, directing traffic to the appropriate destinations or distributing messages to multiple recipients.

Error Handling

  • Supervisors: These processes are crucial for system resilience. They monitor worker processes and apply pre-defined strategies to handle failures, such as restarting the failed processes.
  • Insulators: Processes that contain faults within a certain part of the system to prevent cascading failures. They can also act similarly to circuit breakers, isolating parts of the system that may cause broader system issues.

Designing BEAM processes according to these archetypes ensures that each process has a clear role and responsibility, essential for the system's maintainability and scalability. It also allows for a modular approach, where each process can be independently developed, tested, and optimized.

Structure Code by Domains

In the final section of our exploration into BEAM system design, we turn our attention to code structure. A vital strategy for effectively organizing code is to align it with business domains. This approach breaks down the system into distinct areas of functionality that correspond with different aspects of the business operations they represent.

Each domain should represent a core business function, providing a focus for the development efforts. Developers can create systems that mirror the business's real-world organization by aligning code structure with these domains. Domains establish clear boundaries within the codebase. This separation ensures that changes in one domain do not have unintended effects on others, facilitating easier maintenance and scalability.

Encapsulating domain-specific logic within its bounded context allows for a cleaner codebase. It also makes the system more adaptable to changes within that domain without affecting the core functionalities of other domains.

Now that we have divided the system into domains, the next step is to construct it by integrating functions, modules, and applications. Starting with a broad perspective and narrowing down to specifics, we can think of system design in three primary layers: applications, modules, and functions. This top-down approach helps outline the system's architecture from the macro to the micro level, ensuring that each component fits into the larger purpose and design.

Applications are the most expansive layer, each representing a substantial domain within the system. They are composed of several modules and define the system's macro functionality. Applications should have a clearly identified role and an API that exposes the necessary functionalities. They're the frontline of the domain, providing the services and interactions that users and other systems interface with.

Within applications, modules act as subdomains. They encapsulate a related set of functionalities and abstract the specifics away from the application layer. Modules are crucial for breaking down the application's complexity into manageable segments. A well-defined module has a consistent API, the only interface through which the rest of the application interacts with the module's internal functions.

At the granular level are functions, the fundamental units of execution that perform specific, well-defined tasks. They are the building blocks within modules designed to accomplish a particular operation effectively and efficiently. The system's logic resides in functions, carrying out computations and data manipulations that drive the module's capabilities.

Suppose some of your functions perform general work that is not specific to an application or domain. If they are used in several places, consider breaking them out into a library application. There is no need to do this prematurely; don’t start writing frameworks and libraries before you have seen the need for the functionality in several places. And don’t take the DRY (don’t repeat yourself) principle too far.

In structuring the system, we start by defining the applications, which sets the stage for the system’s capabilities and boundaries. Each application is then broken down into modules, organizing the system's complexity into focused areas that manage specific aspects of the application's responsibilities. Within each module, functions are defined to perform the operations and tasks necessary to achieve the module’s objectives.

By architecting a system this way, we ensure each layer serves its purpose within the context. The application layer sets the scope and provides the necessary interfaces, and the module layer organizes and delineates the domain's internal logic. The function layer carries out the precise operations required.

APIs serve as contracts between different parts of the codebase. They should be designed with clarity, ensuring that they are both self-explanatory and robust against changes in implementation. APIs should follow consistent design principles throughout the system. This consistency aids in predictability and ease of use for developers interfacing with different system parts. APIs need comprehensive documentation that explains their purpose, usage, and the domain logic they encapsulate. This documentation is vital for maintaining domain integrity and understanding throughout the system.

When aligning code with business domains, developers should have a solid understanding of the business context to create domains that accurately reflect business needs. Domains are not static. As business needs evolve, so should the corresponding domains and their implementations. Domain structuring promotes collaboration between technical teams and business stakeholders, ensuring the system evolves in line with business objectives. Structuring code by business domains provides a logical and maintainable organization within the codebase and aligns technical solutions with business strategy. This synergy between business and technology is crucial for creating systems that support and drive business objectives.

Conclusion

To design concurrent systems on BEAM, it is crucial to comprehend and utilize its concurrency model. BEAM facilitates the execution of processes in complete isolation, and these processes communicate via message passing. This approach to designing systems makes them highly resilient to failure and easy to manage concurrency. It's important to note that a process is not the code it executes. Instead, consider a process as a worker and code as instructions. When designing the process architecture, it's helpful to think about tasks, flows, and process archetypes. When structuring your code, it's beneficial to think in terms of domains. These practices will help you build more robust and maintainable systems.

- Happi


Happi Hacking AB
Munkbrogatan 2, 5 tr
111 27 Stockholm