Skip to content

Commit b6feaba

Browse files
gpsheadclaude
andcommitted
pystrhex: Enable SIMD on 32-bit ARM with NEON
Extend portable SIMD support to ARM32 when NEON is available. The __builtin_shufflevector interleave compiles to vzip instructions on ARMv7 NEON, similar to zip1/zip2 on ARM64. NEON is optional on 32-bit ARM (unlike ARM64 where it's mandatory), so we check for __ARM_NEON in addition to __arm__. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent b2dd34e commit b6feaba

1 file changed

Lines changed: 4 additions & 3 deletions

File tree

Python/pystrhex.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,13 @@ _Py_hexlify_scalar(const unsigned char *src, Py_UCS1 *dst, Py_ssize_t len)
1919
/* Portable SIMD optimization for hexlify using GCC/Clang vector extensions.
2020
Uses __builtin_shufflevector for portable interleave that compiles to
2121
native SIMD instructions (SSE2 punpcklbw/punpckhbw on x86-64,
22-
NEON zip1/zip2 on ARM64).
22+
NEON zip1/zip2 on ARM64, vzip on ARM32).
2323
2424
Requirements:
2525
- GCC 12+ or Clang 3.0+ (for __builtin_shufflevector)
26-
- x86-64 or ARM64 architecture */
27-
#if (defined(__x86_64__) || defined(__aarch64__)) && \
26+
- x86-64, ARM64, or ARM32 with NEON */
27+
#if (defined(__x86_64__) || defined(__aarch64__) || \
28+
(defined(__arm__) && defined(__ARM_NEON))) && \
2829
(defined(__clang__) || (defined(__GNUC__) && __GNUC__ >= 12))
2930
# define PY_HEXLIFY_CAN_COMPILE_SIMD 1
3031
#else

0 commit comments

Comments
 (0)