Skip to content
Rebecca G. Bettencourt edited this page Jan 17, 2022 · 3 revisions

General table information

The Private Use Area attribute table (tag name: 'PUAA') provides a method for specifying Unicode character properties for characters in the Unicode Private Use Area (code points E000-F8FF, F0000-FFFFD, and 100000-10FFFD).

This table is not part of the Unicode Standard and is not endorsed by the Unicode Consortium. The information included in this table should be considered informative only and is not expected to be used by any text rendering engine.

PUA Attribute Table Format

The table header gives the table version number, the number of properties defined, and the offset in bytes to the property name and subtable. The format of the table header is as follows:

Type Name Description
UInt16 version The table version number. Set to 1.
UInt16 propertyCount The number of properties defined.
PropertyRecord propertyRecord[propertyCount] The property records array.

The property records array follows the table header. Each record consists of an offset to the property name and an offset to the subtable header. Here is the format of a PropertyRecord:

Type Name Description
UInt32 propertyNameOffset Offset in bytes from the start of the table to the property name.
UInt32 subtableHeaderOffset Offset in bytes from the start of the table to the subtable header.

Property records should be sorted in ascending order by property name. The property name consists of a length byte followed by the name in UTF-8.

Type Name Description
UInt8 length Length of string in bytes.
UInt8 name[length] UTF-8 encoded string.

Property Subtable Format

Each property record points to a property subtable header consisting of the number of entries and an array of entry records. The format of the subtable header is as follows:

Type Name Description
UInt16 entryCount The number of entries in this subtable.
EntryRecord entryRecord[entryCount] The entry records array.

The entry records array follows the subtable header. Each record consists of a type, a range of code points, and a value or offset.

Type Name Description
UInt8 entryType The type of the entry data.
UInt8 plane The Unicode plane of the range of code points covered by this entry. One of 0, 15, or 16.
UInt16 firstCodePoint The least significant 16 bits of the first code point covered by this entry.
UInt16 lastCodePoint The least significant 16 bits of the last code point covered by this entry.
UInt32 entryData Interpretation varies according to entryType.

The entryType field determines the interpretation of the entryData field:

Entry Type ID Entry Type Meaning
1 Single The entryData field contains either an offset to a UTF-8 character string or a 0-4 byte ASCII character string.
2 Multiple The entryData field contains an offset to an array of Single values, one for each code point.
3 Boolean The entryData field contains zero for a false property value or nonzero for a true property value.
4 Decimal The entryData field contains a plain integer value.
5 Hexadecimal The entryData field contains a Unicode code point value.
6 HexMultiple The entryData field contains an offset to an array of Hexadecimal values, one for each code point.
7 HexSequence The entryData field contains an offset to an array of Unicode code point values.
8 CaseMapping The entryData field contains an offset to an array of Unicode code point values, plus a Single value.
9 NameAlias The entryData field contains an offset to an array of two Single values, one for a name and one for a name type.

Entry Type 1 (Single)

If the most significant bit of the entryData field is clear, the entryData field contains an offset in bytes from the start of the table to a property value string. The property value string consists of a length byte followed by the value in UTF-8.

If the most significant bit of the entryData field is set, the entryData field itself contains up to four ASCII characters, padded with null bytes if less than four. For example, CC61746E is the string "Latn", CC750000 is the string "Lu", and 80000000 is the empty string "".

The value of the entry applies to the entire range of code points. If multiple entries of type 1 or 2 apply to a code point, the value of the property for that code point shall be the concatenation of all values of matching entries, in the order in which they are encountered. It is possible for the resulting property value to exceed 255 bytes.

Entry Type 2 (Multiple)

The entryData field contains an offset in bytes from the start of the table to an array of values. The array starts with the number of values, which should equal the number of code points covered by the entry:

Type Name Description
UInt16 valueCount The number of values. Should equal lastCodePoint - firstCodePoint + 1.
UInt32 valueData[valueCount] The array of values, each interpreted as in entry type 1.

The first value applies to firstCodePoint, the second to firstCodePoint + 1, and so on. Each valueData field contains either an offset to a string, or up to four ASCII characters with the most significant bit set, as in entry type 1. If multiple entries of type 1 or 2 apply to a code point, the value of the property for that code point shall be the concatenation of all values of matching entries, in the order in which they are encountered. It is possible for the resulting property value to exceed 255 bytes.

Entry Type 3 (Boolean)

The entryData field contains a boolean value. If zero, the property value is false. If nonzero, the property value is true. The value of the entry applies to the entire range of code points.

This entry type is used for boolean properties such as Bidi_Mirrored from UnicodeData.txt, White_Space from PropList.txt, or Emoji_Component from emoji-data.txt.

Entry Type 4 (Decimal)

The entryData field contains a decimal integer value. The value of the entry applies to the entire range of code points.

This entry type is used for Canonical_Combining_Class from UnicodeData.txt as well as some Unihan properties such as kTotalStrokes.

Entry Type 5 (Hexadecimal)

The entryData field contains a single code point value. The value of the entry applies to the entire range of code points.

This entry type is used for properties such as Bidi_Mirroring_Glyph and Simple_Uppercase_Mapping.

Entry Type 6 (HexMultiple)

The entryData field contains an offset in bytes from the start of the table to an array of code point values. The array starts with the number of code point values, which should equal the number of code points covered by the entry:

Type Name Description
UInt16 valueCount The number of values. Should equal lastCodePoint - firstCodePoint + 1.
UInt32 valueData[valueCount] The array of values, each interpreted as in entry type 5.

The first code point value applies to firstCodePoint, the second to firstCodePoint + 1, and so on. Each valueData field contains a single code point value, as in entry type 5.

This entry type is used for properties such as Bidi_Mirroring_Glyph and Simple_Uppercase_Mapping.

Entry Type 7 (HexSequence)

The entryData field contains an offset in bytes from the start of the table to an array. The array contains any number of code point values, interpreted as in entry type 5.

Type Name Description
UInt16 valueCount The number of code point values.
UInt32 valueData[valueCount] The array of code point values.

The array itself is the value of the entry, which applies to the entire range of code points.

This entry type is used for the Decomposition_Mapping property from UnicodeData.txt.

Entry Type 8 (CaseMapping)

The entryData field contains an offset in bytes from the start of the table to an array. The array contains any number of code point values, interpreted as in entry type 5, followed by one string value, interpreted as in entry type 1.

Type Name Description
UInt16 valueCount The number of values: the number of code points, plus 1.
UInt32 mappingValueData[valueCount-1] The array of code point values.
UInt32 conditionValueData The condition value, interpreted as in entry type 1.

The array itself is the value of the entry, which applies to the entire range of code points.

This entry type is used for the special casing properties from SpecialCasing.txt.

Entry Type 9 (NameAlias)

The entryData field contains an offset in bytes from the start of the table to an array. The array contains two string values, interpreted as in entry type 1.

Type Name Description
UInt16 valueCount The number of values. Must equal 2.
UInt32 aliasValueData The alias value, interpreted as in entry type 1.
UInt32 typeValueData The type value, interpreted as in entry type 1.

The array itself is the value of the entry, which applies to the entire range of code points.

This entry type is used for the Name_Alias property from NameAliases.txt.

Clone this wiki locally