Python Multiprocessing: Start Methods, Pools, and Communication
Processes vs Threads
Memory model and isolation
- Threads live inside a single process and share the same address space. Any mutable object (lists, dicts, classes) is visible to every thread unless protected with synchronization primitives. Threads provide easy data sharing, but it is easy to corrupt shared state (race conditions).
- Processes have separate address spaces. A child process cannot directly see the parent’s Python objects. Data must be transferred via inter-process communication (IPC), which involves serialization (pickle) unless using specialized shared memory. It provides safer isolation and robustness (a crash in one process usually does not corrupt others), but passing data has overhead.
Example: (updates a global in threads vs processes).
from threading import Thread
from multiprocessing import Process
counter_thread = 0
counter_process = 0
def bump_thread(n):
global counter_thread
for _ in range(n):
counter_thread += 1
print("Thread's counter value:", counter_thread)
def bump_process(n):
global counter_process
for _ in range(n):
counter_process += 1
print("Process's counter value:", counter_process)
if __name__ == "__main__":
t1 = Thread(target=bump_thread, args=(100_000,))
t2 = Thread(target=bump_thread, args=(100_000,))
t1.start(); t2.start()
t1.join(); t2.join()
p1 = Process(target=bump_process, args=(100_000,))
p2 = Process(target=bump_process, args=(100_000,))
p1.start(); p2.start()
p1.join(); p2.join()
print("Process does not see counter in same memory:", counter_process)
print("Threads see counter in same memory:", counter_thread)
Output:
Thread's counter value: 100000
Thread's counter value: 200000
Process's counter value: 100000
Process's counter value: 100000
Process does not see counter in same memory: 0
Threads see counter in same memory: 200000
Overhead and scheduling
- Threads are lightweight to create and context switch, but in CPython only one thread executes Python bytecode at a time (GIL). It is ideal for I/O, not for CPU-bound parallel work.
- Processes are heavier in terms of start-up time, separate memory, and IPC costs. But each process has its own interpreter and GIL, so CPU-bound work can run truly in parallel across cores.
Communication
Threads
- All threads within a process share the same memory space (global variables, heap, etc.). Communication is implicit in threads; so, if one thread modifies a variable, other threads can immediately see it.
-
Pros:
- Very fast, no serialization needed.
- Easy to share complex Python objects.
-
Cons:
- Risk of race conditions if multiple threads read/write the same variable at the same time.
- Need explicit synchronization (Lock, RLock, Semaphore, Condition, etc.) to avoid data corruption.
Process
- Isolated memory model i.e. each process has its own memory space. Communication requires Inter-Process Communication (IPC), usually:
-
multiprocessing.Queue
(safe FIFO pipe). -
multiprocessing.Pipe
(two-way communication channel). -
multiprocessing.Manager
(shared dict/list across processes). -
multiprocessing.shared_memory
(fast, large data arrays).
-
-
Pros:
- No accidental race conditions (since memory is isolated).
- Fault isolation (a crash in one process does not corrupt others).
-
Cons:
- Communication is slower due to serialization (pickling/unpickling).
- Extra overhead for large or frequent data transfers.
Creating Processes and Managing its Lifecycle
Note: Always protect the entry point with
if __name__ == "__main__":
on Windows/macOS (spawn/forkserver). It prevents recursive child creation when the module is imported by the new interpreter.
multiprocessing.Process
Example:
from multiprocessing import Process, current_process
import os, time
def work(n):
print(f"[{current_process().name}] PID={os.getpid()} start with n={n}")
time.sleep(1)
print(f"[{current_process().name}] done")
if __name__ == "__main__":
p1 = Process(target=work, name="Worker-1", args=(10,))
p2 = Process(target=work, name="Worker-2", args=(20,))
p1.start(); p2.start()
print("Alive?", p1.is_alive(), p2.is_alive())
p1.join(); p2.join()
print("Exit codes:", p1.exitcode, p2.exitcode)
Output:
Alive? True True
[Worker-2] PID=56571 start with n=20
[Worker-1] PID=56570 start with n=10
[Worker-2] done
[Worker-1] done
Exit codes: 0 0
Important functions
-
p.start()
: Launches the process in a new Python interpreter. -
p.join(timeout=None)
: Waits for the process to finish (optional timeout). -
p.terminate()
: Forcefully stops the process (unsafe, no cleanup). -
p.is_alive()
: A boolean attribute that checks whether the process is still running (True
) or not (False
). -
p.pid
: The OS process ID of the child process. -
p.name
: Name of the process (default:Process-N
, customizable). -
p.exitcode
: Exit status (0
means success, nonzero means error,None
means still running). -
p.daemon
: A boolean attribute that tells whether a process is daemon or non-daemon.
Start Methods – spawn
, fork
, forkserver
Python provides three start methods for processes: spawn
, fork
, and forkserver
.
spawn
It is default on Windows and macOS. It starts a fresh Python interpreter process. It is safer and more predictable because nothing is shared accidentally. It is slower to start than fork
(since it loads everything fresh).
- Use case: Cross-platform code, or when safety is more important than startup speed.
Note: In case of
spawn
, only picklable objects can be passed to child processes.
Example:
from multiprocessing import Process, set_start_method
def worker(x):
print(f"Worker received {x}")
if __name__ == "__main__":
set_start_method("spawn")
p = Process(target=worker, args=(42,))
p.start()
p.join()
Output:
Worker received 42
fork
It is default on Linux/Unix. Child process is created by forking the parent (copy-on-write memory). It provides very fast startup because the child inherits the parent’s memory. It is unsafe with threads or some C extensions (like NumPy
, TensorFlow
, or database connectors) because they may not expect to be copied.
- Use case: High-performance parallel tasks where there is no need to rely on complex C extensions or threading.
Example:
from multiprocessing import Process, set_start_method
def worker(x):
print(f"Worker received {x}")
if __name__ == "__main__":
set_start_method("fork")
p = Process(target=worker, args=(99,))
p.start()
p.join()
Output:
Worker received 99
forkserver
It starts a dedicated fork server process. Every new process is forked from that clean server, not from the potentially complex parent. It avoids some of the unsafe state inheritance problems of fork
. So, it is safer than fork
, but still faster than spawn
.
-
Use case: When there is a need of speed of
fork
but with more safety in complex apps (web servers, ML frameworks).
Example:
from multiprocessing import Process, set_start_method
def worker(x):
print(f"Worker received {x}")
if __name__ == "__main__":
set_start_method("forkserver")
p = Process(target=worker, args=(123,))
p.start()
p.join()
Output:
Worker received 123
Summary table
Method | Default OS | Speed | Safety | Restrictions |
---|---|---|---|---|
spawn |
Windows/macOS | Slow | Very Safe | Only picklable objects |
fork |
Linux/Unix | Fast | Risky | Unsafe with threads/C-extensions |
forkserver |
None (must set) | Medium | Safer than fork | Requires starting fork server |
A good rule of thumb
- Use
spawn
in case of portable and safe code. - Use
fork
on Linux and the need is fast startup, and the environment is thread-safe. - Use
forkserver
if need is balance between performance and safety.
Process Pools (Pool
, map
, imap
, imap_unordered
, apply
, and apply_async
)
Pool
When there are many tasks that can run in parallel, instead of manually creating dozens of Process
objects, process pool can be used.
A Pool manages a fixed number of worker processes:
- Submit tasks (functions + arguments).
- The pool distributes tasks to workers.
- Get results back (synchronously or asynchronously).
Note: This is perfect for CPU-bound work like prime testing, image processing, or simulations.
Example (Basic Pool Setup):
from multiprocessing import Pool, cpu_count
def square(x):
return x * x
if __name__ == "__main__":
# create pool with N workers
with Pool(processes=cpu_count()) as pool:
numbers = [1, 2, 3, 4, 5]
results = pool.map(square, numbers)
print(results)
Output:
[1, 4, 9, 16, 25]
In the above example,
-
cpu_count()
defines the number of CPU cores. - The line
with Pool(processes=cpu_count()) as pool
:- Creates a process pool with as many worker processes as there are CPU cores on the machine.
- For example, if computer has 8 cores, then 8 worker processes are started.
- Each worker is a separate Python process that can run tasks in parallel (true parallelism, not blocked by the GIL).
Pool.map
(Blocking Batch Processing)
It can be treated as parallel version of map()
. It requires a function and a list of inputs. It waits until all results are ready.
It is ideal for:
- A small list of tasks.
- When all results at required together.
- It has some downside:
- It blocks until the whole batch is done.
- Since there are large iterables so there is high memory use.
Example:
from multiprocessing import Pool, cpu_count
import math, time
def is_prime(n: int) -> bool:
# Check if n is prime (naive method).
if n < 2: return False
if n % 2 == 0: return n == 2
r = int(math.isqrt(n))
for f in range(3, r+1, 2):
if n % f == 0:
return False
return True
if __name__ == "__main__":
numbers = [10_000_019, 10_000_079, 10_000_091, 10_000_099]
with Pool(cpu_count()) as pool:
start = time.perf_counter()
# waits for all results
results = pool.map(is_prime, numbers)
finish = time.perf_counter()
total = finish - start
print(list(zip(numbers, results)))
print(f"Time elapsed: {total:.2f}s with {cpu_count()} workers")
Output:
[(10000019, True), (10000079, True), (10000091, False), (10000099, False)]
Time elapsed: 0.06s with 8 workers
Pool.imap
and Pool.imap_unordered
(Streaming Results)
Unlike map
, results are streamed back one at a time. It is useful when tasks are long-running or numerous.
.imap
imap
returns results in the same order as the input. It is a generator so results are streamed one by one, not all at once (like map
). It is useful when order matters (for example, processing a sequence of tasks and later logic depends on that order).
.imap_unordered
.imap_unordered
as the name suggests, returns results as soon as they are ready (order is not guaranteed). It is also a generator which streams results as they complete. It is useful when order does not matter, and the need is faster throughput.
Example:
from multiprocessing import Pool
import time, random
def slow_square(n):
# Simulate a task with random delay.
delay = random.uniform(0.5, 2.0) # 0.5–2 seconds
time.sleep(delay)
result = n * n
print(f"Task {n} finished in {delay:.2f}s → {result}")
return result
def demo_imap():
print("--- Using imap (ordered results) ---")
items = [1, 2, 3, 4, 5]
with Pool(3) as pool: # 3 workers
for res in pool.imap(slow_square, items, chunksize=1):
print("Got result (ordered):", res)
def demo_imap_unordered():
print("--- Using imap_unordered (unordered results) ---")
items = [1, 2, 3, 4, 5]
with Pool(3) as pool:
for res in pool.imap_unordered(slow_square, items, chunksize=1):
print("Got result (unordered):", res)
if __name__ == "__main__":
demo_imap()
demo_imap_unordered()
Output:
--- Using imap (ordered results) ---
Task 1 finished in 1.53s → 1
Got result (ordered): 1
Task 2 finished in 1.61s → 4
Got result (ordered): 4
Task 3 finished in 1.89s → 9
Got result (ordered): 9
Task 4 finished in 1.41s → 16
Got result (ordered): 16
Task 5 finished in 1.57s → 25
Got result (ordered): 25
--- Using imap_unordered (unordered results) ---
Task 2 finished in 1.29s → 4
Got result (unordered): 4
Task 3 finished in 1.35s → 9
Got result (unordered): 9
Task 1 finished in 1.49s → 1
Got result (unordered): 1
Task 5 finished in 0.60s → 25
Got result (unordered): 25
Task 4 finished in 1.07s → 16
Got result (unordered): 16
Note:
chunksize
inmultiprocessing.Pool
controls how many tasks each worker process gets at once.
chunksize=1
send tasks one by one; more responsive, less efficient.- Larger
chunksize
send tasks in batches; less communication overhead, faster for uniform tasks.
Pool.apply
(Blocking, one task at a time)
It blocks the main process until the task finishes. It returns the direct result, just like calling the function normally.
Execution flow:
Main process → give task to worker → wait until done → print result → continue
- Useful if:
- Want to test multiprocessing with one task.
- Need the result immediately before moving on.
Example:
from multiprocessing import Pool
def cube(x):
return x**3
if __name__ == "__main__":
with Pool(3) as pool:
print("Sending task...")
result = pool.apply(cube, (5,)) # BLOCKS until done
print("Got result:", result) # Prints 125
print("Main continues AFTER task is done.")
Output:
Sending task...
Got result: 125
Main continues AFTER task is done.
Pool.apply_async
(Non-blocking, many tasks at once)
It does not block the main process, can keep doing other work while workers run. Instead of the result directly, it returns a AsyncResult
object (a “future”).
-
Can call:
-
.get()
: Fetches the result (blocking). If the worker has finished, it returns the value. If it is still running, the program will wait until it is done. - Example:
from multiprocessing import Pool import time def square(x): time.sleep(2) return x * x if __name__ == "__main__": with Pool(2) as pool: async_res = pool.apply_async(square, (5,)) print("Main keeps running...") result = async_res.get() # Waits here until worker finishes print("Result:", result) # OUTPUT: # Main keeps running... # Result: 25
-
.ready()
: Fetches the result (blocking). If the worker has finished, it returns the value. If it is still running, the program will wait until it is done. - Example:
from multiprocessing import Pool import time def square(x): time.sleep(2) return x * x if __name__ == "__main__": with Pool(2) as pool: async_res = pool.apply_async(square, (5,)) while not async_res.ready(): print("Still working...") time.sleep(0.5) print("Done:", async_res.get()) # OUTPUT: # Main keeps running... # Result: 25
-
.wait()
: Blocked until finished, no result. It just pauses until the worker is done, but does not return the result (can call.get()
later). - Example:
from multiprocessing import Pool import time def square(x): time.sleep(2) return x * x if __name__ == "__main__": with Pool(2) as pool: async_res = pool.apply_async(square, (5,)) async_res.wait() # Just wait until it’s finished print("Now safe to get:", async_res.get()) # OUTPUT: # Now safe to get: 25
-
-
Supports callbacks:
-
callback
: It runs when task succeeds. -
error_callback
: It runs if task fails.
-
Execution flow:
Main process → fire tasks → workers crunch numbers → results/errors handled via callbacks → main waits at the end (if needed)
Example (with callbacks):
from multiprocessing import Pool
import time
def work(x):
time.sleep(1) # pretend it's slow
if x == 5:
raise ValueError("boom")
return x * x
def on_ok(result):
print("Result ready:", result)
def on_err(err):
print("Error:", err)
if __name__ == "__main__":
with Pool(3) as pool:
futures = [
pool.apply_async(work, (i,), callback=on_ok, error_callback=on_err)
for i in range(8)
]
print("Main can do other stuff while workers run...")
# Wait for all tasks
for f in futures:
f.wait()
print("All tasks completed")
Output:
Main can do other stuff while workers run...
Result ready: 1
Result ready: 0
Result ready: 4
Result ready: 9
Result ready: 16
Error: boom
Result ready: 49
Result ready: 36
All tasks completed
Quick between apply
and apply_async
Feature |
apply (Blocking) |
apply_async (Non-blocking) |
---|---|---|
Blocking? | Yes (main waits) | No (main continues) |
Return value | Direct result |
AsyncResult object |
Suitable for | One task, need result now | Many tasks, async workflow |
Callbacks | Not supported | Supported (success & error) |
Typical use | Simple testing/debugging | Real-world workloads, batch jobs |
Comparison
Method | Blocking? | Input Size Suitability | Order Preserved? | Best for |
---|---|---|---|---|
map |
Yes | Small–medium lists | Yes | Simple batch jobs |
imap |
No | Large streams | Yes | Stream results in order |
imap_unordered |
No | Large streams | No | Faster feedback |
apply |
Yes | Single task | Yes | One-off process run |
apply_async |
No | Many tasks | Optional via callback | Async workflows |
Rule of thumb
-
map
: easiest for small datasets. -
imap
/imap_unordered
: best for big datasets. -
apply
/apply_async
: for one-off or async job scheduling.