Module: Text
Module: Text
See:
- PDF Reference 1.7: §10.7, p. 883
- ISO 32000: 14.8 Tagged PDF, p. 569
- Content shall be tagged in logical reading order. The most semantically appropriate tag shall be used for document content.
- Character codes shall map to Unicode as described in “Unicode Mapping in Tagged PDF” in Section 10.7.1, p. 892 (ISO 32000: 14.8.4.2, “Unicode Mapping in Tagged PDF,” p. 575).
- Stretchable characters such as parentheses or brackets (often drawn by combining several individual glyphs to form the appearance of a single glyph) shall be tagged using Actual Text, as specified in PDF Reference 1.7, §10.8.3 (ISO 32000: 14.9.4, “Replacement Text,” p. 611).
- Characters not included in any published Unicode specification may use the Unicode private use area or declare another published character encoding.
- Font characters shall be available for each character code, including Braille, as described in “Font Characteristics” in PDF Reference 1.7, Section 10.7.1, p. 892 (ISO 32000: 14.8.2.4.3, “Font Characteristics,” p. 575).
- Natural language shall be declared as discussed in PDF Reference 1.7, Section 10.8.1, p. 936 (ISO 32000: 14.9.2, “Natural Language Specification,” p. 607) and/or as described in PDF Reference 1.7, Section 3.8.1, p. 157 (ISO 32000: 7.9.2, “String Types,” p. 83). Language codes shall be derived solely from IETF BCP 47, “Tags for Identifying Languages.” In particular:
- Documents not expressed in a natural language shall declare the root language as zxx.
- Documents expressed in a language unknown to the author or creator shall declare the root language as und.
- Documents with equal proportions of multiple languages shall declare the root language as mul and use structure elements to group and tag each content block with the correct code for the language of the content.
- Changes in natural language shall be declared.
- Changes in natural language inside attribute values (e.g., inside Alternate Text and Bookmarks) shall be declared using tag characters as described in §16.9 of Unicode 5.1: Special Areas and Format Characters (PDF).
- Text direction shall be declared.
- Changes in text direction shall be declared.
- When the meaning is ambiguous to the intended readership, abbreviations, acronyms, initialisms, and short forms shall be tagged with Abbr and their expansion shall be given per §14.9.5 in ISO 32000.
Top of 14289 Drafting