Bug report
Bug description
I've been testing the free-threaded build for thread safety issues in unaudited modules (ref #116738 — _pickle.c is unchecked). Found a segfault when pickle.dumps() runs concurrently with dict mutations on the same dict.
The crash happens in batch_dict_exact at Modules/_pickle.c:3496 — it iterates dict items via _PyDict_Next which gives borrowed references, but there's no critical section on the dict. If another thread pops or replaces a key mid-iteration, the borrowed reference goes stale and Py_INCREF hits a freed/NULL pointer.
Sometimes it raises RuntimeError: dictionary changed size during iteration cleanly, but other times the reference gets invalidated before the size check and it just segfaults.
Reproducer
import pickle
import threading
shared = {str(i): list(range(50)) for i in range(100)}
for trial in range(2000):
barrier = threading.Barrier(3)
def dumper():
try:
barrier.wait()
pickle.dumps(shared)
except Exception:
pass
def mutator():
try:
barrier.wait()
for j in range(200):
key = str(j % 100)
shared[key] = list(range(j % 50))
if j % 10 == 0:
shared.pop(str(j % 100), None)
shared[str(j % 100)] = [j]
except Exception:
pass
threads = [
threading.Thread(target=dumper),
threading.Thread(target=dumper),
threading.Thread(target=mutator),
]
for t in threads: t.start()
for t in threads: t.join()
Crashes within a few hundred iterations usually. I ran it 10 times, got 9 segfaults.
Tested on
3.14.0a4 free-threading (stock install, no sanitizers):
$ python3.14t reproducer.py
Segmentation fault (core dumped)
3.15.0a7+ free-threading (built with --with-address-sanitizer):
AddressSanitizer:DEADLYSIGNAL
==1334802==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000000c
#0 _Py_atomic_load_uint32_relaxed Include/cpython/pyatomic_gcc.h:367
#1 Py_INCREF Include/refcount.h:267
#2 batch_dict_exact Modules/_pickle.c:3496
#3 save_dict Modules/_pickle.c:3563
#4 save Modules/_pickle.c:4568
#5 dump Modules/_pickle.c:4768
#6 _pickle_dumps_impl Modules/_pickle.c:7992
Root cause
batch_dict_exact calls PyDict_Next() which returns borrowed references to key and value
- Between
PyDict_Next() returning and Py_INCREF(key) / Py_INCREF(value), another thread can mutate the dict
- The borrowed reference points to a freed object,
Py_INCREF dereferences it and segfaults
- The size check at line 3512 (
PyDict_GET_SIZE(obj) != dict_size) only catches mutations that change dict size, and only after the batch — too late to prevent the stale reference
- Same issue exists in both the single-item path (line 3474) and the batch loop (line 3494)
Possible fix
- Wrap
PyDict_Next() + Py_INCREF(key) + Py_INCREF(value) in Py_BEGIN_CRITICAL_SECTION(obj) / Py_END_CRITICAL_SECTION() to prevent dict mutation while converting borrowed refs to owned refs
- Keep
save() calls outside the critical section since they can reenter Python
- Same approach as the set iteration path at line 3644 which already uses
Py_BEGIN_CRITICAL_SECTION(obj) with _PySet_NextEntryRef
- Both
PyDict_Next call sites in batch_dict_exact need this (line 3474 and line 3494)
CPython versions tested on
3.14, CPython main branch
Operating systems tested on
Linux
Linked PRs
Bug report
Bug description
I've been testing the free-threaded build for thread safety issues in unaudited modules (ref #116738 —
_pickle.cis unchecked). Found a segfault whenpickle.dumps()runs concurrently with dict mutations on the same dict.The crash happens in
batch_dict_exactatModules/_pickle.c:3496— it iterates dict items via_PyDict_Nextwhich gives borrowed references, but there's no critical section on the dict. If another thread pops or replaces a key mid-iteration, the borrowed reference goes stale andPy_INCREFhits a freed/NULL pointer.Sometimes it raises
RuntimeError: dictionary changed size during iterationcleanly, but other times the reference gets invalidated before the size check and it just segfaults.Reproducer
Crashes within a few hundred iterations usually. I ran it 10 times, got 9 segfaults.
Tested on
3.14.0a4 free-threading (stock install, no sanitizers):
3.15.0a7+ free-threading (built with
--with-address-sanitizer):Root cause
batch_dict_exactcallsPyDict_Next()which returns borrowed references to key and valuePyDict_Next()returning andPy_INCREF(key)/Py_INCREF(value), another thread can mutate the dictPy_INCREFdereferences it and segfaultsPyDict_GET_SIZE(obj) != dict_size) only catches mutations that change dict size, and only after the batch — too late to prevent the stale referencePossible fix
PyDict_Next()+Py_INCREF(key)+Py_INCREF(value)inPy_BEGIN_CRITICAL_SECTION(obj)/Py_END_CRITICAL_SECTION()to prevent dict mutation while converting borrowed refs to owned refssave()calls outside the critical section since they can reenter PythonPy_BEGIN_CRITICAL_SECTION(obj)with_PySet_NextEntryRefPyDict_Nextcall sites inbatch_dict_exactneed this (line 3474 and line 3494)CPython versions tested on
3.14, CPython main branch
Operating systems tested on
Linux
Linked PRs