gh-68451: Fix unittest discovery to support Unicode module names#144853
gh-68451: Fix unittest discovery to support Unicode module names#144853RoryGlenn wants to merge 6 commits intopython:mainfrom
Conversation
Replace the ASCII-only VALID_MODULE_NAME regex with str.isidentifier() to support test modules whose names start with non-ASCII Unicode letters (e.g., café.py, 測試.py). Also add a directory name validation check so that directories with invalid identifier names (e.g., containing hyphens) are skipped during package discovery.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ac73104fa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
There was a problem hiding this comment.
Pull request overview
Updates CPython’s unittest test discovery to properly accept and discover test modules with Unicode identifier names (per PEP 3131), fixing an ASCII-only validation that rejected valid filenames.
Changes:
- Replace ASCII-only module filename regex with
str.isidentifier()-based validation. - Skip recursing into directories whose names aren’t valid Python identifiers (prevents futile/erroneous package imports).
- Add unit + integration tests covering Unicode module discovery and invalid directory rejection.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| Misc/NEWS.d/next/Library/2026-02-15-22-23-15.gh-issue-68451.2pPUuV.rst | Documents the change in unittest discovery behavior for Unicode-named modules. |
| Lib/unittest/loader.py | Implements identifier-based validation for module filenames and package directory names during discovery. |
| Lib/test/test_unittest/test_discovery.py | Adds tests validating _valid_module_name(), Unicode discovery, and invalid directory name skipping. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # test-not-a-module.py and 123bad.py should be excluded; | ||
| # test.foo should be excluded (wrong extension). | ||
| # Sorted by Unicode code points: test_dir (and its children) come | ||
| # before tëst_três since '_' (U+005F) < 'ë' (U+00EB). |
There was a problem hiding this comment.
The ordering rationale in this comment is incorrect: test_dir sorts before tëst_três due to 'e' (U+0065) in test_dir comparing less than 'ë' (U+00EB), not because of '_' vs 'ë'. Please adjust the comment so it matches Python’s actual string comparison behavior used by sorted() here.
| # before tëst_três since '_' (U+005F) < 'ë' (U+00EB). | |
| # before tëst_três since 'e' (U+0065) < 'ë' (U+00EB). |
gh-68451: Fix unittest discovery to support Unicode module names
Summary
unittesttest discovery previously used an ASCII-only regex ([_a-z]\w*\.py$) to validate module names, which rejected test files starting with non-ASCII Unicode letters (e.g.,café.py,測試.py). This PR replaces the regex withstr.isidentifier(), which correctly handles all valid Python identifiers per PEP 3131.Changes
Lib/unittest/loader.pyVALID_MODULE_NAMEregex and the unusedimport re_valid_module_name()function that usesos.path.splitext()+str.isidentifier()to validate module filenames_find_test_path()— directories with invalid identifier names (e.g., containing hyphens) are now properly skipped during package discoveryLib/test/test_unittest/test_discovery.pytest_valid_module_name: Tests the new_valid_module_name()function with ASCII, Unicode, and invalid namestest_find_tests_with_unicode_modules: Integration test verifying Unicode-named modules are discovered alongside ASCII onestest_find_test_path_rejects_invalid_dir_name: Tests that directories with invalid identifier names are skippedPrior work
This issue has two stale PRs (#1338 from 2017, #13149 from 2019) that were never merged. This PR incorporates reviewer feedback from those PRs:
str.isidentifier()instead of regex (per @vstinner, @ezio-melotti)self.addCleanup(setattr, ...)pattern (per @ezio-melotti)Testing
All 1,095 unittest tests pass.