Skip to content

gh-68451: Fix unittest discovery to support Unicode module names#144853

Open
RoryGlenn wants to merge 6 commits intopython:mainfrom
RoryGlenn:gh-68451-unittest-unicode-discovery
Open

gh-68451: Fix unittest discovery to support Unicode module names#144853
RoryGlenn wants to merge 6 commits intopython:mainfrom
RoryGlenn:gh-68451-unittest-unicode-discovery

Conversation

@RoryGlenn
Copy link
Copy Markdown

gh-68451: Fix unittest discovery to support Unicode module names

Summary

unittest test discovery previously used an ASCII-only regex ([_a-z]\w*\.py$) to validate module names, which rejected test files starting with non-ASCII Unicode letters (e.g., café.py, 測試.py). This PR replaces the regex with str.isidentifier(), which correctly handles all valid Python identifiers per PEP 3131.

Changes

Lib/unittest/loader.py

  • Removed VALID_MODULE_NAME regex and the unused import re
  • Added _valid_module_name() function that uses os.path.splitext() + str.isidentifier() to validate module filenames
  • Added directory name validation in _find_test_path() — directories with invalid identifier names (e.g., containing hyphens) are now properly skipped during package discovery

Lib/test/test_unittest/test_discovery.py

  • test_valid_module_name: Tests the new _valid_module_name() function with ASCII, Unicode, and invalid names
  • test_find_tests_with_unicode_modules: Integration test verifying Unicode-named modules are discovered alongside ASCII ones
  • test_find_test_path_rejects_invalid_dir_name: Tests that directories with invalid identifier names are skipped

Prior work

This issue has two stale PRs (#1338 from 2017, #13149 from 2019) that were never merged. This PR incorporates reviewer feedback from those PRs:

Testing

All 1,095 unittest tests pass.

Replace the ASCII-only VALID_MODULE_NAME regex with str.isidentifier()
to support test modules whose names start with non-ASCII Unicode letters
(e.g., café.py, 測試.py).

Also add a directory name validation check so that directories with
invalid identifier names (e.g., containing hyphens) are skipped during
package discovery.
@python-cla-bot
Copy link
Copy Markdown

python-cla-bot Bot commented Feb 15, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ac73104fa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread Lib/unittest/loader.py Outdated
Comment thread Lib/test/test_unittest/test_discovery.py
Copilot AI review requested due to automatic review settings March 8, 2026 17:46
Co-authored-by: Victor Stinner <vstinner@python.org>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates CPython’s unittest test discovery to properly accept and discover test modules with Unicode identifier names (per PEP 3131), fixing an ASCII-only validation that rejected valid filenames.

Changes:

  • Replace ASCII-only module filename regex with str.isidentifier()-based validation.
  • Skip recursing into directories whose names aren’t valid Python identifiers (prevents futile/erroneous package imports).
  • Add unit + integration tests covering Unicode module discovery and invalid directory rejection.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
Misc/NEWS.d/next/Library/2026-02-15-22-23-15.gh-issue-68451.2pPUuV.rst Documents the change in unittest discovery behavior for Unicode-named modules.
Lib/unittest/loader.py Implements identifier-based validation for module filenames and package directory names during discovery.
Lib/test/test_unittest/test_discovery.py Adds tests validating _valid_module_name(), Unicode discovery, and invalid directory name skipping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# test-not-a-module.py and 123bad.py should be excluded;
# test.foo should be excluded (wrong extension).
# Sorted by Unicode code points: test_dir (and its children) come
# before tëst_três since '_' (U+005F) < 'ë' (U+00EB).
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering rationale in this comment is incorrect: test_dir sorts before tëst_três due to 'e' (U+0065) in test_dir comparing less than 'ë' (U+00EB), not because of '_' vs 'ë'. Please adjust the comment so it matches Python’s actual string comparison behavior used by sorted() here.

Suggested change
# before tëst_três since '_' (U+005F) < 'ë' (U+00EB).
# before tëst_três since 'e' (U+0065) < 'ë' (U+00EB).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants