Skip to content

Commit 3bf238d

Browse files
committed
Optimize bytes.translate() by deferring change detection
Move the equality check out of the hot loop to allow better compiler optimization. Instead of checking each byte during translation, perform a single memcmp at the end to determine if the input can be returned unchanged. This allows compilers to unroll and pipeline the loops, resulting in ~2x throughput improvement for medium-to-large inputs (tested on an AMD zen2). No change observed on small inputs. It will also be faster for bytes subclasses as those do not need change detection.
1 parent 7ca9e7a commit 3bf238d

1 file changed

Lines changed: 9 additions & 5 deletions

File tree

Objects/bytesobject.c

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2237,11 +2237,15 @@ bytes_translate_impl(PyBytesObject *self, PyObject *table,
22372237
/* If no deletions are required, use faster code */
22382238
for (i = inlen; --i >= 0; ) {
22392239
c = Py_CHARMASK(*input++);
2240-
if (Py_CHARMASK((*output++ = table_chars[c])) != c)
2241-
changed = 1;
2242-
}
2243-
if (!changed && PyBytes_CheckExact(input_obj)) {
2244-
Py_SETREF(result, Py_NewRef(input_obj));
2240+
*output++ = table_chars[c];
2241+
}
2242+
/* Check if anything changed (for returning original object) */
2243+
/* We save this check until the end so that the compiler will */
2244+
/* unroll the loop above leading to MUCH faster code. */
2245+
if (PyBytes_CheckExact(input_obj)) {
2246+
if (memcmp(PyBytes_AS_STRING(input_obj), output_start, inlen) == 0) {
2247+
Py_SETREF(result, Py_NewRef(input_obj));
2248+
}
22452249
}
22462250
PyBuffer_Release(&del_table_view);
22472251
PyBuffer_Release(&table_view);

0 commit comments

Comments
 (0)