Workers: Do One Job, Then Get Out of the Way

Short-lived processes, polite pools, and where people overcomplicate things

Posted: 2025-11-22

Categories: development , Erlang , Processes , Archetypes , worker

Happi with
a hardhat working with power tools.

In the gnome village, a Worker is the simplest citizen you can have.

You give it a task.

It does the task.
It reports back.
It disappears.

That is the whole contract.

On the BEAM, this maps directly to one of the most useful patterns you can have: spawn a process per unit of work. You get isolation, concurrency, and fault containment almost for free. When that is not enough, you add a pool.

This post is about Workers as a process archetype: how they behave, when to use them, when to pool them, and when you’re just building your own problem factory.

Short-Lived Workers: the default

The simplest Worker is a short-lived process:

do(Input) ->
  Caller = self(),
  Ref = make_ref(),
  Worker = spawn(fun() ->
                   Result = do_one_job(Input),
                   Caller ! {self(), Ref, Result}
                end),
  receive
    {Worker, Ref, Result} -> Result
    after 5000 -> exit({timeout, Worker, Input})
  end.

It has no history. No future. No shared state. It exists purely to handle one piece of work and then die.

This works extremely well for:

Sending emails.
Writing audit logs.
Calling external APIs.
Performing background calculations.

You could create a worker template:

worker_do(Fun, Input, Timeout) ->
  Caller = self(),
  Ref = make_ref(),
  Worker = spawn(fun() ->
                   Result = Fun(Input),
                   Caller ! {self(), Ref, Result}
                end),
  receive
    {Worker, Ref, Result} -> Result
    after Timeout -> log({timeout, Worker, Input})
  end.

If the Worker crashes, only that one job is affected. The caller can in turn time out, retry, or as in the example log a failure. No other state is corrupted. The village shrugs and hires another gnome.

In this version you don’t see the crash reason unless you log it somewhere else.

Now a linked version, where the default behaviour is:

“If the worker dies, I die too.”

This is the pure YBYOYR flavour: if the worker blows up, the caller is probably in a bad state as well.

worker_do_link_or_crash(Fun, Input, Timeout) ->
    Caller = self(),
    Ref = make_ref(),
    Worker = spawn_link(fun() ->
        Result = Fun(Input),
        Caller ! {self(), Ref, Result}
    end),
    receive
        {Worker, Ref, Result} ->
            Result
    after Timeout ->
            exit({timeout, Worker, Input})
    end.

Here:

If Fun(Input) raises an exception, the Worker exits abnormally.
Because of spawn_link/1, the caller gets an 'EXIT' signal and dies too.
You either get a Result or your process crashes and lets a supervisor deal with it.

This is the simplest linked worker: either success or crash. No middle ground.

Handling worker crashes without dying

Sometimes you want the link semantics (ties lifetimes together) but you don’t want this process to die. Then you must trap exits:

worker_do_link_trap(Fun, Input, Timeout) ->
    TrapExit = process_flag(trap_exit, true),
    Caller = self(),
    Ref = make_ref(),
    Worker = spawn_link(fun() ->
        Result = Fun(Input),
        Caller ! {self(), Ref, {ok, Result}}
    end),
    StatusAndResult =
      receive
        {Worker, Ref, {ok, Result}} ->
            {ok, Result};
        {'EXIT', Worker, Reason} ->
            {error, {worker_crashed, Reason}}
      after Timeout ->
            {error, {timeout, Worker, Input}}
      end,
    process_flag(trap_exit, TrapExit),
    StatusAndResult.

Notes:

We set trap_exit so 'EXIT' becomes a normal message instead of killing us.
On success we get {ok, Result}.
On crash we see {error, {worker_crashed, Reason}}.
On timeout we get {error, {timeout, Worker, Input}}.

This is a good library pattern for a synchronous, linked worker with explicit error handling.

In production code you might want to set trap_exit only in a dedicated supervisor/manager process, not everywhere.

Asynchronous workers: fire-and-forget + async/await

Next step: separate starting the worker from waiting for the result.

Fire and forget

The most basic async worker:

fire_and_forget(Fun, Input) ->
    _Pid = spawn(fun() -> Fun(Input) end),
    ok.

You don’t know if it succeeds. You don’t know when. Sometimes that is fine (e.g. “best-effort” metrics, logging to a third-party system). Often it’s not.

Async handle: start now, await later

Better: return a handle {Pid, Ref} and let the caller decide when (or if) to wait.

worker_async_start(Fun, Input) ->
    Caller = self(),
    Ref = make_ref(),
    Pid = spawn(fun() ->
        Result = Fun(Input),
        Caller ! {self(), Ref, {ok, Result}}
    end),
    {Pid, Ref}.

To wait for it:

worker_async_await({Pid, Ref}, Timeout) ->
    receive
        {Pid, Ref, {ok, Result}} ->
            {ok, Result}
    after Timeout ->
            {error, {timeout, Pid}}
    end.

This gives you:

A non-blocking call to start work.
A separate blocking call to await, with its own timeout.
Freedom to stash the handle in state, pass it to another process, or ignore it.

You can obviously extend this to handle crashes by using spawn_monitor/1 instead of spawn/1 and handling 'DOWN' messages, but that’s the same pattern with one extra branch.

Parallelism: workers as fan-out/fan-in

Now the fun part: use Workers to process parts of a problem in parallel.

Imagine a simple pmap/2 (parallel map):

pmap(Fun, List) ->
    Caller = self(),
    % Start one worker per item
    Pids = [spawn(fun() ->
                      Result = Fun(X),
                      Caller ! {self(), Result}
                  end)
            || X <- List],
    collect_results(Pids, []).

collect_results/2 can be:

collect_results([], Acc) ->
    lists:reverse(Acc);
collect_results([Pid | Rest], Acc) ->
    receive
        {Pid, Result} ->
            collect_results(Rest, [Result | Acc])
    after 5000 ->
            exit({timeout_waiting_for, Pid})
    end.

Properties:

One worker per element.
Each worker replies with {Pid, Result}.
Results are collected in the same order as the original list, because we walk the Pids list in order.
If the worker for a given Pid is slow or stuck, we block there and earlier results from other workers will sit in the mailbox until we get to their PID. That’s intentional: we are enforcing ordered results.

You’ve just implemented:

Fan-out: spawn a Worker per “chunk”.
Fan-in: collect results by Pid.
Clear fault model: each Worker is semi independent; one crash doesn’t poison others, but we lose the result.

You can refine this:

Ignore timed-out results, and later drain the mailbox to get rid of late replies.
Ignore the result entirely if the only thing you care about is the side effect (for example, sending an email).
Chunk the list (e.g. 100 items per Worker) if the list is huge.
Use a fixed-size pool instead of unbounded spawns if external resources are involved. We will look at pooled workers soon.
Use spawn_link or spawn_monitor for stricter crash semantics and better visibility into failures.

Short-lived Workers align perfectly with the BEAM’s strengths. Processes are cheap to spawn and cheap to terminate. The scheduler is designed for this. Concurrency is the default setting, not something you have to fight for.

The main way to misuse short-lived Workers is to spawn them without any thought of rate. If every incoming HTTP request spawns ten internal Workers that talk to five different external services, you now have a fan-out explosion and a new surprise: your outbound traffic graph.

Workers are cheap. External systems are not.

When a Worker Becomes a Pool Member

Sometimes “spawn as many as you like” is not the right choice. You may have:

A database that only handles 50 concurrent connections without melting.
An external API with strict rate limits.
A crypto context or GPU handle that is too expensive to recreate for every job.

In these cases you want many callers, but only a controlled number of Workers running at the same time.

That is all a Worker pool is.

The BEAM ecosystem has several reliable implementations:

poolboy: the classic Erlang pool, battle-tested for a decade.
pooler and poolgirl: Erlang alternatives with different trade-offs.
Poolex, worker_pool, and others on the Elixir side.

All of them follow the same idea:

You have a fixed number of Workers, and callers borrow one Worker at a time to perform a job.

A poolboy-style call looks like this:

poolboy:transaction(MyPool, fun(WorkerPid) ->
    gen_server:call(WorkerPid, {do, Job})
end).

Callers never talk to the Worker directly. They talk to the pool, and the pool manages who gets to run work next.

The reason for this setup is simple: limit concurrency around a scarce resource.

The Worker archetype stays the same. It still does one job at a time. The only difference is that some Workers are now part of a coordinated “team” rather than being spawned freely.

If the Worker crashes while doing a job, supervision restarts it and the Worker returns to the pool. The pool continues to behave exactly the same, which is why people like it.

A Note About Hybrids (But We’re Not Going There Yet)

Sometimes a pooled Worker also holds a long-lived resource (like a DB connection). That means it is also a kind of Resource Owner.

This is one of the few valid cases where an archetype can be “mixed”, and we’ll talk about it later when we get to resource owners.

For now, keep Worker pools conceptually simple:

They exist to limit concurrency. All the Worker does is: one job at a time.

We’ll revisit the “Workers that hold state” pattern in the upcoming Resource Owner post.

When not to pool Workers

People see Worker pools and get excited. They then start pooling everything.

Do not pool CPU-bound Workers that use only local state. The BEAM is already a giant dynamic pool with preemptive scheduling and excellent fairness. Adding a pool on top usually adds serialization and queueing where you don’t need it.

A few simple rules:

If the Worker only touches local memory and pure CPU, you probably do not need a pool. Just spawn as needed.
If the Worker wraps a scarce external resource (DB connection, API client, file descriptor), a pool is probably a good idea.
If you have a performance problem and your first instinct is “add a pool”, check your external dependencies and message rates first.

A Worker pool is not a performance tool. It is a safety tool to avoid exhausting external resources. If you turn it into a global throttle for all work, you will get exactly that: a global throttle.

Routing versus checkout pools

There are two broad ways to structure pooled Workers.

Checkout pools are what poolboy does. Callers ask the pool for a Worker, use it, and return it when done. This makes sense for blocking flows with exclusive use of a resource, like a DB connection.

Routing pools are different. Callers never see individual Workers. They send messages to a Router, which distributes work across Workers. Andrea Leopardi’s post on process pools with Elixir’s Registry is a good example of this style. "Process pools with Elixir's Registry"

For the Worker archetype, both styles are still Workers:

In a checkout pool, the Worker is checked out, does one job, returns to idle.
In a routing pool, the Worker receives jobs via messages, does one at a time, and stays alive.

The choice is about how callers coordinate, not about what the Worker is.

Failure: cheap and local

A short-lived Worker that crashes is easy to reason about. You get:

One failed job.
A stacktrace.
No lingering state.

A pooled Worker that crashes is also manageable, as long as supervision is correct. A Supervisor can restart the Worker with a fresh connection. The pool continues to function.

The problems start when you accidentally promote the Worker into a God process:

It now keeps global state.
It routes messages.
It logs.
It supervises others.

At that point, when it crashes, your system has an existential crisis instead of a minor incident.

A Worker should never supervise, route, or own global state. It should work.

How this maps to Java’s new toys

Java has finally discovered that threads don’t need to weigh as much as small cars. With Project Loom and structured concurrency you now get virtual threads and task scopes: lighter, cheaper, and with a lifespan you can reason about. On a good day they even feel a little like BEAM processes.

It’s tempting to say: “Ah, Java threads have become Erlang processes.”

They haven’t. They’ve just stopped being quite as heavy.

A few reminders:

Virtual threads still share mutable state unless you fight very hard not to.
Failure propagation and restart strategies are still something you build yourself. (In Erlang the supervisor is a library too, but the primitives it relies on, links, monitors, exit signals, are baked into the VM.)
There is no mailbox. If you want asynchronous message passing, you assemble it from queues and hope for the best.

So yes, you can implement the Worker archetype in Java now without hurting yourself. But you must decide to structure it that way.

On the BEAM, you must work equally hard to avoid doing it that way.

About naming Workers

OTP 27 added something useful: process labels via proc_lib:set_label/1. This lets you attach a descriptive term to any process that does not have a registered name. Tools like c:i/0, observer, and crash reports can show this label.

So you can think of it this way:

Registered name: a real name used for lookup and messaging (register(Name, Pid)).
Label: a descriptive tag for humans and tools; not used for routing or lookup.

For Workers, this means:

You still normally don’t give them global names.
When you do want visibility in tools, add a label, not a global name.
Pooled Workers are typically anonymous processes with labels and are discovered through the pool, not directly.

Takeaways

A Worker is the simplest and most honest process archetype:

Short-lived Workers: one job, then exit.
Pooled Workers: one job at a time, reuse scarce resources.
No routing. No supervision. No global state.

The BEAM makes this style of concurrency both natural and cheap. External systems do not.

In following posts, we will look at the other villagers: Resource owners, Routers, Gatekeepers, and Observers. Together they give you enough vocabulary to design systems with many processes that still behave like adults.

One archetype per role. One role per process. Sleep improves dramatically after that.

- Happi

Back to blog index.

Hacker's Handbook

Keywords

Short-Lived Workers: the default

Handling worker crashes without dying

Asynchronous workers: fire-and-forget + async/await

Fire and forget

Async handle: start now, await later

Parallelism: workers as fan-out/fan-in

When a Worker Becomes a Pool Member

A Note About Hybrids (But We’re Not Going There Yet)

When not to pool Workers

Routing versus checkout pools

Failure: cheap and local

How this maps to Java’s new toys

About naming Workers

Takeaways

Hacker's Handbook

Keywords

Short-Lived Workers: the default

Handling worker crashes without dying

Asynchronous workers: fire-and-forget + async/await

Fire and forget

Async handle: start now, await later

Parallelism: workers as fan-out/fan-in

When a Worker Becomes a Pool Member

A Note About Hybrids (But We’re Not Going There Yet)

When not to pool Workers

Routing versus checkout pools

Failure: cheap and local

How this maps to Java’s new toys

About naming Workers

Takeaways

Schedule Your Free Strategic Consultation

Book Your Consultation