Christopher Sardegna's Blog

Thoughts on technology, design, data analysis, and data visualization.


Multiprocessing Changes in Python 3.8

Multiprocessing Changes in Python 3.8

Python 3.8 Safety on MacOS

With the release of Python 3.8 came many improvements, both in additional features and in safety. One of those additions to make Python a safer language altered how processes are created when running on MacOS. These changes deeply affect how the existing concurrency model works in Python, especially when using some popular third party libraries.

3.7 and prior

Multiprocessing used to default to using fork() when creating a new process. This is important because using fork() ensured that the new child process had access to everything the parent process had access to:

>>> some_data = {1: "one", 2: "three"}
>>> func = lambda: print(some_data.get(2, None))
>>> process = multiprocessing.Process(target=func)
>>> process.start()
>>> process.join()
three

Note we could not write to the parent's memory, only read and modify the copy sent to the child’s stack:

>>> some_data = {1: "one", 2: "three"}
>>> func = lambda: some_data.update({2: "two"})
>>> process = multiprocessing.Process(target=func)
>>> process.start()
>>> process.join()
>>> some_data
{1: "one", 2: "three"}

Disassembly

This works because using fork() copies the entire stack, which allows the parent caller to pickle the data in the variables we have access to and send that data to the child processes. We can verify what happens when we disassemble the lambda:

def main():
    some_data = {1: "one", 2: "three"}
    func = lambda: some_data.get(2)

    process = multiprocessing.Process(target=func)
    process.start()
    process.join()
    print(some_data)


if __name__ == "__main__":
    dis.dis(main)

This disassembles to:

  7           0 LOAD_CONST               1 ('one')
              2 LOAD_CONST               2 ('three')
              4 LOAD_CONST               3 ((1, 2))
              6 BUILD_CONST_KEY_MAP      2
              8 STORE_DEREF              0 (some_data)

  8          10 LOAD_CLOSURE             0 (some_data)
             12 BUILD_TUPLE              1
             14 LOAD_CONST               4 (<code object <lambda> at 0x1052059c0, file "test.py", line 8>)
             16 LOAD_CONST               5 ('main.<locals>.<lambda>')
             18 MAKE_FUNCTION            8
             20 STORE_FAST               0 (func)

 10          22 LOAD_GLOBAL              0 (multiprocessing)
             24 LOAD_ATTR                1 (Process)
             26 LOAD_FAST                0 (func)
             28 LOAD_CONST               6 (('target',))
             30 CALL_FUNCTION_KW         1
             32 STORE_FAST               1 (process)

 11          34 LOAD_FAST                1 (process)
             36 LOAD_METHOD              2 (start)
             38 CALL_METHOD              0
             40 POP_TOP

 12          42 LOAD_FAST                1 (process)
             44 LOAD_METHOD              3 (join)
             46 CALL_METHOD              0
             48 POP_TOP
             50 LOAD_CONST               0 (None)
             52 RETURN_VALUE

Disassembly of <code object <lambda> at 0x1052059c0, file "test.py", line 8>:
  8           0 LOAD_DEREF               0 (some_data)
              2 LOAD_METHOD              0 (get)
              4 LOAD_CONST               1 (2)
              6 CALL_METHOD              1
              8 RETURN_VALUE

The interesting operations happen inside of the lambda object at the end: we first dereference some_data, then get the key 2. This all makes sense, because we STORE_DEREF it. Put simply, we store the data at the top of the stack into a cell1. When we later load it, Python loads the cell and pushes a reference to the object the cell contains to the top of the stack.

Python Cell Objects

What CPython does is cast some_data from a local variable to an independent cell object. Because the cell object exists independently from main()s stack frame, it can be dereferenced by the lambda.

Python 3.7.4 (default, Aug 15 2019, 12:39:43)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> some_data = {1: "one", 2: "three"}
>>> func = lambda: some_data.get(2)
>>> process = multiprocessing.Process(target=func)
>>> process.start()
>>> process.join()
>>>

CPython Source

To create a child process prior to 3.8, CPython uses the Popen class from popen_fork.py. This makes a C system call to fork(), which has implementation standards2:

The fork() function shall create a new process. The new process (child process) shall be an exact copy of the calling process (parent process)…

On MacOS (darwin) this implementation comes from __fork.s:

LEAF(___fork, 0)
    subq  $24, %rsp   // Align the stack, plus room for local storage

    movl     $ SYSCALL_CONSTRUCT_UNIX(SYS_fork),%eax; // code for fork -> rax
    UNIX_SYSCALL_TRAP        // do the system call
    jnc    L1            // jump if CF==0

    movq    %rax, %rdi
    CALL_EXTERN(_cerror)
    movq    $-1, %rax
    addq    $24, %rsp   // restore the stack
    ret
    
L1:
    orl    %edx,%edx    // CF=OF=0,  ZF set if zero result    
    jz    L2        // parent, since r1 == 0 in parent, 1 in child
    
    //child here...
    xorq    %rax, %rax
    PICIFY(__current_pid)
    movl    %eax,(%r11)
L2:
    // parent ends up here skipping child portion
    addq    $24, %rsp   // restore the stack
    ret

This assembly code calls the kernel to create the fork, so the flow looks like fork() -> Darwin wrapper -> raw syscall invocation -> transition to kernel mode -> syscall lookup -> sys_fork() -> do_fork(). At the end of all this, we have created a new process with its own stack that exists a distinct copy of the original stack.

3.8 and Beyond

In Python 3.8 however, the default method to create a new process changed to spawn(), which has entirely different behavior. The child process now crashes because it cannot dereference the data we are asking it to:

Python 3.8.1 (default, Jan 24 2020, 16:43:46)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> some_data = {1: "one", 2: "three"}
>>> func = lambda: some_data.get(2)
>>> process = multiprocessing.Process(target=func)
>>> process.start()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x10510bc10>: attribute lookup <lambda> on __main__ failed

Disassembly

The disassembly of the lambda expression is the same:

Disassembly of <code object <lambda> at 0x1052059c0, file "test.py", line 8>:
  8           0 LOAD_DEREF               0 (some_data)
              2 LOAD_METHOD              0 (get)
              4 LOAD_CONST               1 (2)
              6 CALL_METHOD              1
              8 RETURN_VALUE

However, the first LOAD_DEREF fails because the parent process did not copy the stack into a cell for the child to access, thus leading to the _pickle.PicklingError the interpreter raises.

Stack Variables

We can verify this by inspecting the global stack when inside the parent process.

Old Process Stack

If we re-write the lambda to instead print the global3 variables the process has access to, Python tells us that it can see 'some_data': {1: 'one', 2: 'three'} as we would expect when forking the entire process:

>>> process = multiprocessing.Process(target=globals)
>>> process.start()
>>> {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'some_data': {1: 'one', 2: 'three'}, 'func': <function <lambda> at 0x1040230e0>, 'multiprocessing': <module 'multiprocessing' from '/Users/chris/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/__init__.py'>, 'process': <Process(Process-12, started)>}

As expected, this looks almost identical to a fresh interpreter instance:

Python 3.7.6 (default, Jan 24 2020, 20:01:36)
[Clang 11.0.0 (clang-1100.0.33.8)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> globals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>}

New Process Stack

However, when we run this same code in Python 3.8, the global stack of the child process looks totally different:

>>> proc = multiprocessing.Process(target=globals)
>>> proc.start()
>>> {'log_to_stderr': False, 'authkey': b'\x11@nPJ\xa3\xfeY\xbc%\xf8J\xc6`\xc1\xfd\xce\xca\x98EB\xb2\x8a\xefg\x17,\xf0\x93\xd3t\xb7', 'name': 'Process-11', 'sys_path': ['/Users/chris', '/Users/chris/.pyenv/versions/3.8.1/lib/python38.zip', '/Users/chris/.pyenv/versions/3.8.1/lib/python3.8', '/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/lib-dynload', '/Users/chris/.pyenv/versions/3.8.1/lib/python3.8/site-packages'], 'sys_argv': [''], 'orig_dir': '/Users/chris', 'dir': '/Users/chris', 'start_method': 'spawn'}

These data come from get_preparation_data, which get sent to get_command_line, and are used to create an new instance of the Python interpreter to pipe commands to:

def get_command_line(**kwds):
    '''
    Returns prefix of command line used for spawning a child process
    '''
    if getattr(sys, 'frozen', False):
        return ([sys.executable, '--multiprocessing-fork'] +
                ['%s=%r' % item for item in kwds.items()])
    else:
        prog = 'from multiprocessing.spawn import spawn_main; spawn_main(%s)'
        prog %= ', '.join('%s=%r' % item for item in kwds.items())
        opts = util._args_from_interpreter_flags()
        return [_python_exe] + opts + ['-c', prog, '--multiprocessing-fork']\

Once Python has the start command, it runs it via spawnv_passfds:

def spawnv_passfds(path, args, passfds):
    import _posixsubprocess
    passfds = tuple(sorted(map(int, passfds)))
    errpipe_read, errpipe_write = os.pipe()
    try:
        return _posixsubprocess.fork_exec(
            args, [os.fsencode(path)], True, passfds, None, None,
            -1, -1, -1, -1, -1, -1, errpipe_read, errpipe_write,
            False, False, None)
    finally:
        os.close(errpipe_read)
        os.close(errpipe_write)

PicklingError Crash

The reason we crash when unpickling is simple. Before creating the child process, Python opens a pipe in the parent process. When we get the write end of this pipe later, we duplicate it, so it still has all of the references to data in the parent’s scope:

def spawn_main(pipe_handle, parent_pid=None, tracker_fd=None):
    '''
    Run code specified by data received over pipe
    '''
    assert is_forking(sys.argv), "Not forking"
    if sys.platform == 'win32':
        …
    else:
        from . import resource_tracker
        resource_tracker._resource_tracker._fd = tracker_fd
        fd = pipe_handle
        parent_sentinel = os.dup(pipe_handle)
    exitcode = _main(fd, parent_sentinel)
    sys.exit(exitcode)

In the call to _main we attempt to de-serialize the pipe, however we cannot because we attempt to access data that is not in the global scope of the new interpreter. Python only has access to data that a new interpreter would have as opposed to the entire parent’s stack4:


__name__ multiprocessing.spawn
__doc__ None
__package__ multiprocessing
__loader__ <_frozen_importlib_external.SourceFileLoader object at 0x1037a9580>
__spec__ ModuleSpec(name='multiprocessing.spawn', loader=<_frozen_importlib_external.SourceFileLoader object at 0x1037a9580>, origin='/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py')
__file__ /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py
__cached__ /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/__pycache__/spawn.cpython-38.pyc
__builtins__ {'__name__': 'builtins', '__doc__': "truncated", ..}
os <module 'os' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/os.py'>
sys <module 'sys' (built-in)>
runpy <module 'runpy' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py'>
types <module 'types' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/types.py'>
get_start_method <bound method DefaultContext.get_start_method of <multiprocessing.context.DefaultContext object at 0x10385bc10>>
set_start_method <bound method DefaultContext.set_start_method of <multiprocessing.context.DefaultContext object at 0x10385bc10>>
process <module 'multiprocessing.process' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py'>
reduction <module 'multiprocessing.reduction' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py'>
util <module 'multiprocessing.util' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/util.py'>
__all__ ['_main', 'freeze_support', 'set_executable', 'get_executable', 'get_preparation_data', 'get_command_line', 'import_main_path']
WINEXE False
WINSERVICE False
_python_exe /Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8
set_executable <function set_executable at 0x103954820>
get_executable <function get_executable at 0x1039c5ee0>
is_forking <function is_forking at 0x1039c5f70>
freeze_support <function freeze_support at 0x1039c1040>
get_command_line <function get_command_line at 0x1039c10d0>
spawn_main <function spawn_main at 0x1039c1160>
_main <function _main at 0x1039c11f0>
_check_not_importing_main <function _check_not_importing_main at 0x1039c1280>
get_preparation_data <function get_preparation_data at 0x1039c1310>
old_main_modules []
prepare <function prepare at 0x1039c13a0>
_fixup_main_from_name <function _fixup_main_from_name at 0x1039c1430>
_fixup_main_from_path <function _fixup_main_from_path at 0x1039c14c0>
import_main_path <function import_main_path at 0x1039c1550>\

  1. A cell object essentially holds a reference to another object ↩︎

  2. This definition is also available in the Linux man pages. ↩︎

  3. We can inspect these with globals(). ↩︎

  4. We can view this information by adding a line like (print(g, v[g]) for g in v) before we de-serialize. ↩︎