For most ideographs, the Name property value is derived by concatenating a script-specific prefix string to the code point, expressed in uppercase hexadecimal, with the usual 4- to 6-digit convention (see rule NR2 in chapter 4.8.1 of Unicode 17.0.0 spec).
Thus, names for Hangul syllables and most Han and Tangut ideographic characters are not explicitly listed in UnicodeData.txt. They are generated algorithmically in unicodedata. See #80667. But ideographic characters for scripts other than Han and Tangut, as well as Egyptian hieroglyphs, have their names listed explicitly in UnicodeData.txt, even when their names are derived by rule NR2. We can reduce the name table if exclude names derived by rule NR2 and generate them using existing code.
Linked PRs
For most ideographs, the Name property value is derived by concatenating a script-specific prefix string to the code point, expressed in uppercase hexadecimal, with the usual 4- to 6-digit convention (see rule NR2 in chapter 4.8.1 of Unicode 17.0.0 spec).
Thus, names for Hangul syllables and most Han and Tangut ideographic characters are not explicitly listed in UnicodeData.txt. They are generated algorithmically in
unicodedata. See #80667. But ideographic characters for scripts other than Han and Tangut, as well as Egyptian hieroglyphs, have their names listed explicitly in UnicodeData.txt, even when their names are derived by rule NR2. We can reduce the name table if exclude names derived by rule NR2 and generate them using existing code.Linked PRs