With the release of Python 3.8 came many improvements, both in additional features and in safety. One of those additions to make Python a safer language altered how processes are created when running on MacOS. These changes deeply affect how the existing concurrency model works in Python, especially when using some popular third party libraries.
Multiprocessing used to default to using fork()
when creating a new process. This is important because using fork()
ensured that the new child process had access to everything the parent process had access to:
>>> some_data = {1: "one", 2: "three"}
>>> func = lambda: print(some_data.get(2, None))
>>> process = multiprocessing.Process(target=func)
>>> process.start()
>>> process.join()
three
Note we could not write to the parent's memory, only read and modify the copy sent to the child’s stack:
>>> some_data = {1: "one", 2: "three"}
>>> func = lambda: some_data.update({2: "two"})
>>> process = multiprocessing.Process(target=func)
>>> process.start()
>>> process.join()
>>> some_data
{1: "one", 2: "three"}
This works because using fork()
copies the entire stack, which allows the parent caller to pickle the data in the variables we have access to and send that data to the child processes. We can verify what happens when we disassemble the lambda:
def main():
some_data = {1: "one", 2: "three"}
func = lambda: some_data.get(2)
process = multiprocessing.Process(target=func)
process.start()
process.join()
print(some_data)
if __name__ == "__main__":
dis.dis(main)
This disassembles to:
7 0 LOAD_CONST 1 ('one')
2 LOAD_CONST 2 ('three')
4 LOAD_CONST 3 ((1, 2))
6 BUILD_CONST_KEY_MAP 2
8 STORE_DEREF 0 (some_data)
8 10 LOAD_CLOSURE 0 (some_data)
12 BUILD_TUPLE 1
14 LOAD_CONST 4 (<code object <lambda> at 0x1052059c0, file "test.py", line 8>)
16 LOAD_CONST 5 ('main.<locals>.<lambda>')
18 MAKE_FUNCTION 8
20 STORE_FAST 0 (func)
10 22 LOAD_GLOBAL 0 (multiprocessing)
24 LOAD_ATTR 1 (Process)
26 LOAD_FAST 0 (func)
28 LOAD_CONST 6 (('target',))
30 CALL_FUNCTION_KW 1
32 STORE_FAST 1 (process)
11 34 LOAD_FAST 1 (process)
36 LOAD_METHOD 2 (start)
38 CALL_METHOD 0
40 POP_TOP
12 42 LOAD_FAST 1 (process)
44 LOAD_METHOD 3 (join)
46 CALL_METHOD 0
48 POP_TOP
50 LOAD_CONST 0 (None)
52 RETURN_VALUE
Disassembly of <code object <lambda> at 0x1052059c0, file "test.py", line 8>:
8 0 LOAD_DEREF 0 (some_data)
2 LOAD_METHOD 0 (get)
4 LOAD_CONST 1 (2)
6 CALL_METHOD 1
8 RETURN_VALUE
The interesting operations happen inside of the lambda
object at the end: we first dereference some_data
, then get
the key 2
. This all makes sense, because we STORE_DEREF
it. Put simply, we store the data at the top of the stack into a cell1. When we later load it, Python loads the cell and pushes a reference to the object the cell contains to the top of the stack.
What CPython does is cast some_data
from a local variable to an independent cell object. Because the cell object exists independently from main()
s stack frame, it can be dereferenced by the lambda
.
Python 3.7.4 (default, Aug 15 2019, 12:39:43)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> some_data = {1: "one", 2: "three"}
>>> func = lambda: some_data.get(2)
>>> process = multiprocessing.Process(target=func)
>>> process.start()
>>> process.join()
>>>
To create a child process prior to 3.8, CPython uses the Popen
class from popen_fork.py
. This makes a C system call to fork()
, which has implementation standards2:
The fork() function shall create a new process. The new process (child process) shall be an exact copy of the calling process (parent process)…
On MacOS (darwin) this implementation comes from __fork.s
:
LEAF(___fork, 0)
subq $24, %rsp // Align the stack, plus room for local storage
movl $ SYSCALL_CONSTRUCT_UNIX(SYS_fork),%eax; // code for fork -> rax
UNIX_SYSCALL_TRAP // do the system call
jnc L1 // jump if CF==0
movq %rax, %rdi
CALL_EXTERN(_cerror)
movq $-1, %rax
addq $24, %rsp // restore the stack
ret
L1:
orl %edx,%edx // CF=OF=0, ZF set if zero result
jz L2 // parent, since r1 == 0 in parent, 1 in child
//child here...
xorq %rax, %rax
PICIFY(__current_pid)
movl %eax,(%r11)
L2:
// parent ends up here skipping child portion
addq $24, %rsp // restore the stack
ret
This assembly code calls the kernel to create the fork, so the flow looks like fork()
-> Darwin
wrapper -> raw syscall
invocation -> transition to kernel mode -> syscall
lookup -> sys_fork()
-> do_fork()
. At the end of all this, we have created a new process with its own stack that exists a distinct copy of the original stack.
In Python 3.8 however, the default method to create a new process changed to spawn()
, which has entirely different behavior. The child process now crashes because it cannot dereference the data we are asking it to:
Python 3.8.1 (default, Jan 24 2020, 16:43:46)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> some_data = {1: "one", 2: "three"}
>>> func = lambda: some_data.get(2)
>>> process = multiprocessing.Process(target=func)
>>> process.start()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x10510bc10>: attribute lookup <lambda> on __main__ failed
The disassembly of the lambda expression is the same:
Disassembly of <code object <lambda> at 0x1052059c0, file "test.py", line 8>:
8 0 LOAD_DEREF 0 (some_data)
2 LOAD_METHOD 0 (get)
4 LOAD_CONST 1 (2)
6 CALL_METHOD 1
8 RETURN_VALUE
However, the first LOAD_DEREF
fails because the parent process did not copy the stack into a cell for the child to access, thus leading to the _pickle.PicklingError
the interpreter raises.
We can verify this by inspecting the global stack when inside the parent process.
If we re-write the lambda to instead print the global3 variables the process has access to, Python tells us that it can see 'some_data': {1: 'one', 2: 'three'}
as we would expect when forking the entire process:
>>> process = multiprocessing.Process(target=globals)
>>> process.start()
>>> {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'some_data': {1: 'one', 2: 'three'}, 'func': <function <lambda> at 0x1040230e0>, 'multiprocessing': <module 'multiprocessing' from '/Users/chris/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/__init__.py'>, 'process': <Process(Process-12, started)>}
As expected, this looks almost identical to a fresh interpreter instance:
Python 3.7.6 (default, Jan 24 2020, 20:01:36)
[Clang 11.0.0 (clang-1100.0.33.8)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> globals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>}
However, when we run this same code in Python 3.8, the global stack of the child process looks totally different:
>>> proc = multiprocessing.Process(target=globals)
>>> proc.start()
>>> {'log_to_stderr': False, 'authkey': b'\x11@nPJ\xa3\xfeY\xbc%\xf8J\xc6`\xc1\xfd\xce\xca\x98EB\xb2\x8a\xefg\x17,\xf0\x93\xd3t\xb7', 'name': 'Process-11', 'sys_path': ['/Users/chris', '/Users/chris/.pyenv/versions/3.8.1/lib/python38.zip', '/Users/chris/.pyenv/versions/3.8.1/lib/python3.8', '/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/lib-dynload', '/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/site-packages'], 'sys_argv': [''], 'orig_dir': '/Users/chris', 'dir': '/Users/chris', 'start_method': 'spawn'}
These data come from get_preparation_data
, which get sent to get_command_line
, and are used to create an new instance of the Python interpreter to pipe commands to:
def get_command_line(**kwds):
'''
Returns prefix of command line used for spawning a child process
'''
if getattr(sys, 'frozen', False):
return ([sys.executable, '--multiprocessing-fork'] +
['%s=%r' % item for item in kwds.items()])
else:
prog = 'from multiprocessing.spawn import spawn_main; spawn_main(%s)'
prog %= ', '.join('%s=%r' % item for item in kwds.items())
opts = util._args_from_interpreter_flags()
return [_python_exe] + opts + ['-c', prog, '--multiprocessing-fork']\
Once Python has the start command, it runs it via spawnv_passfds
:
def spawnv_passfds(path, args, passfds):
import _posixsubprocess
passfds = tuple(sorted(map(int, passfds)))
errpipe_read, errpipe_write = os.pipe()
try:
return _posixsubprocess.fork_exec(
args, [os.fsencode(path)], True, passfds, None, None,
-1, -1, -1, -1, -1, -1, errpipe_read, errpipe_write,
False, False, None)
finally:
os.close(errpipe_read)
os.close(errpipe_write)
The reason we crash when unpickling is simple. Before creating the child process, Python opens a pipe in the parent process. When we get the write
end of this pipe later, we duplicate it, so it still has all of the references to data in the parent’s scope:
def spawn_main(pipe_handle, parent_pid=None, tracker_fd=None):
'''
Run code specified by data received over pipe
'''
assert is_forking(sys.argv), "Not forking"
if sys.platform == 'win32':
…
else:
from . import resource_tracker
resource_tracker._resource_tracker._fd = tracker_fd
fd = pipe_handle
parent_sentinel = os.dup(pipe_handle)
exitcode = _main(fd, parent_sentinel)
sys.exit(exitcode)
In the call to _main
we attempt to de-serialize the pipe, however we cannot because we attempt to access data that is not in the global scope of the new interpreter. Python only has access to data that a new interpreter would have as opposed to the entire parent’s stack4:
__name__ multiprocessing.spawn
__doc__ None
__package__ multiprocessing
__loader__ <_frozen_importlib_external.SourceFileLoader object at 0x1037a9580>
__spec__ ModuleSpec(name='multiprocessing.spawn', loader=<_frozen_importlib_external.SourceFileLoader object at 0x1037a9580>, origin='/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py')
__file__ /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py
__cached__ /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/__pycache__/spawn.cpython-38.pyc
__builtins__ {'__name__': 'builtins', '__doc__': "truncated", ..}
os <module 'os' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/os.py'>
sys <module 'sys' (built-in)>
runpy <module 'runpy' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py'>
types <module 'types' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/types.py'>
get_start_method <bound method DefaultContext.get_start_method of <multiprocessing.context.DefaultContext object at 0x10385bc10>>
set_start_method <bound method DefaultContext.set_start_method of <multiprocessing.context.DefaultContext object at 0x10385bc10>>
process <module 'multiprocessing.process' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py'>
reduction <module 'multiprocessing.reduction' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py'>
util <module 'multiprocessing.util' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/util.py'>
__all__ ['_main', 'freeze_support', 'set_executable', 'get_executable', 'get_preparation_data', 'get_command_line', 'import_main_path']
WINEXE False
WINSERVICE False
_python_exe /Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8
set_executable <function set_executable at 0x103954820>
get_executable <function get_executable at 0x1039c5ee0>
is_forking <function is_forking at 0x1039c5f70>
freeze_support <function freeze_support at 0x1039c1040>
get_command_line <function get_command_line at 0x1039c10d0>
spawn_main <function spawn_main at 0x1039c1160>
_main <function _main at 0x1039c11f0>
_check_not_importing_main <function _check_not_importing_main at 0x1039c1280>
get_preparation_data <function get_preparation_data at 0x1039c1310>
old_main_modules []
prepare <function prepare at 0x1039c13a0>
_fixup_main_from_name <function _fixup_main_from_name at 0x1039c1430>
_fixup_main_from_path <function _fixup_main_from_path at 0x1039c14c0>
import_main_path <function import_main_path at 0x1039c1550>\