Text Direction (Advisory)
Text Direction (Advisory)
Text direction
WritingModeInline
The attribute WritingModeInline specifies the inline direction of text, that is, the direction of text within a block (typically the direction of characters within lines).
WritingModeBlock
The attribute WritingModeBlock specifies the block direction of text (typically the direction of lines over a page).
Requirements for text direction
- Text direction shall be declared. The declaration should be on the root tag of the document.
- WritingMode may be locally overridden where warranted, e.g., for mixed-language text where writing direction changes. For this purpose or for other change in writing direction, such change shall be declared. Only the axis of change shall be declared (inline or block, respectively); if the other axis (block or inline, respectively) has not changed, there is no requirement to declare it.
- A tag shall use only one method of declaring text direction. That is, an application may use PDF 1.7 WritingMode or a combination of PDF/UA WritingModeInline and PDF/UA WritingModeBlock but shall not use both methods on the same tag.
Direction values
WritingModeInline and WritingModeBlock share the same set of possible values:
Rectilinear
- LR
- left to right
- RL
- right to left
- TB
- top to bottom
- BT
- bottom to top
Paths around rectangles (including squares and rhombi) are modelled as sequences of straight lines. To define the direction of text along a rectangle, an author MUST use sequences of rectilinear values.
Curved
- Clockwise
- in a curve corresponding in direction to the movement of the hands of a clock
- Counterclockwise
- in a curve opposite in direction to the movement of the hands of a clock
Note that PDF/UA uses Counterclockwise as a value, not Anticlockwise.
Paths along ellipses and arbitrary curves are modelled as circles of equivalent radius. To define the direction of text along an arbitrary curve, an author shall use a sequence of Clockwise and Counterclockwise value(s).
Diagonal
- LLUR
- lower left to upper right
- URLL
- upper right to lower left
- LRUL
- lower right to upper left
- ULLR
- upper left to lower right
No text direction
- None
- no declared text direction
- Unknown
- no known text direction
Text at corners, vertices, and inflection points
Text located exactly at a corner or vertex of a rectangle or at an inflection point of a curved path MUST declare a WritingModeInline value of None.
Backward compatibility
A PDF/UA-compliant application MUST interpret the WritingMode values found in PDF 1.7 as equivalent to PDF/UA text-direction values shown in the following list.
- WritingMode = LrTb
- Equivalent to WritingModeInline = LR and WritingModeBlock = TB
- WritingMode = RlTb
- Equivalent to WritingModeInline = RL and WritingModeBlock = TB
- WritingMode = TbRL
- Equivalent to WritingModeInline = TB and WritingModeBlock = RL
No default direction
PDF 1.7 gave WritingMode a default value of LrTb. PDF/UA-compliant documents have no default value for inline or block text direction. Text direction shall be explicitly declared.
Advisory information
WritingMode and text direction
Scripts with naturally changing direction
Neither PDF 1.7 nor PDF/UA specifies explicit attributes for scripts that, by their nature, naturally change direction in running text. To encode a script that continually changes or alternates direction, an author SHALL use a sequence of WritingModeInline and/or WritingModeBlock values. For example, boustrophedon text, whose reading direction varies from left-to-right on one line to right-to-left on the next, SHALL be modelled as a sequence of WritingModeInline = LR and WritingModeInline = RL values.
Use cases
For clarification, some typical combinations of WritingModeInline and WritingModeBlock values are as follows.
| WritingModeInline | WritingModeBlock | Example |
|---|---|---|
| LR | TB | English; French; Basque; Georgian; Tibetan; Japanese (horizontal writing); Chinese (horizontal writing); Korean; Mongolian (Cyrillic); Tamil; numerals within Hebrew text |
| RL | TB | Hebrew; Yiddish; Arabic; Farsi; Urdu; Pashto |
| TB | RL | Japanese (vertical writing); Chinese (vertical writing) |
| TB | LR | Mongolian (traditional) |
| BT | LR | Ogham |
| any rectilinear | any diagonal | Crossword puzzles; sudoku |
| None | None | Single glyph (any language); single numeral |
| Undefined | Undefined | Language unknown to the PDF author |
Interaction with Images module
A PDF with no Unicode text but with images that are not artifacts will presumably use alternate and/or actual text. (A use case is a PDF of a photograph.)
- For single-language alternate or actual text, the language of the alternate or actual text shall be declared in the document root [or whatever we want to call the highest level] or on the Figure tag.
Languageless documents
PDF/UA Home > PDF/UA 1.0 DRAFT
PDF/UA recognizes that most documents are written in a natural language (i.e., a human language). But some documents do not use a natural language (a languageless document). Some other documents do not use a natural language in certain parts (a partially nonlinguistic document).
Introduction
A languageless PDF is composed entirely or principally of information that is not an expression of a natural (human) language. A languageless PDF MAY include identifiable parts of a natural language (like discernible letters or characters), but MUST NOT BE a fluent, readable, or understandable expression of ideas in such a language.
To aid in identifying languageless PDFs, authors may use the following rules of thumb.
- It may typically use a symbol notation rather than a system of alphabets, ideograms, abjads, and/or syllabaries. Symbols used may have different pronunciations or names, or none at all, in different natural languages.
- It cannot be translated into another natural language because it is not expressed in an original language.
- It is difficult or impossible to render in speech, either by an informed person or by a computerized device.
Extent of nonlinguistic data
A nonlinguistic PDF may be entirely or principally nonlinguistic.
- An entirely nonlinguistic PDF has no parts that are expressed in a natural language. Examples include type samples; notation of nonvocal music; and script samples that list the units of the writing system rather than using the writing system to form natural words, sentences, phrases, or utterances.
- A principally nonlinguistic PDF has some use of natural language, but is mostly nonlinguistic. Any proportion above 50% qualifies as mostly nonlinguistic. Examples include type samples with the name of the typeface listed, with sample words, or with a written introduction; music with lyrics; a script sample that displays readable text in the natural language.
Language coding
To achieve accessibility, particularly for users of devices like screen readers and Braille displays, the natural language of a PDF must be declared (typically on the root tag). Language codes derived from the BCP 47 specification MUST be used unless the language has no specified code in those schemes. In such an exceptional case, an author MAY use any published code for the language.
A nonlinguistic PDF will have no natural language. Use the language code zxx to indicate a nonlinguistic document. The usage parallels that of mul for multilanguage documents and the usage of macrolanguage mappings and language types.
Language coding of subsections
In principally nonlinguistic PDFs, any subsections actually written or expressed in a natural language must use the appropriate language tag. Only the structural entity enclosing the natural-language subsection SHOULD be language-coded. (Structural markup may need to be added to enclose the subsection, including Span or Div.)
Encoded text vs. images
Nonlinguistic PDFs must use encoded text or characters in some declared format (typically Unicode). For example, a PDF consisting of music notation in Unicode qualifies as nonlinguistic. A PDF consisting of an image file that a sighted user might interpret as music notation does not qualify as nonlinguistic. In that case, appropriate text equivalents (including, if necessary, actual text) must be provided.
Use cases of nonlinguistic PDFs
The PDF/UA specification defines the following examples of nonlinguistic PDFs. The list is not exhaustive. Authors may use reasonable judgement in assessing if and when a PDF qualifies as nonlinguistic. A single PDF may include more than one example.
- Type samples. Examples of typefaces, usually with few or no natural-language passages set in those typefaces. May include annotations, identifiers, or sample words in a natural language. May include examples of nonlinguistic symbols (e.g., pi characters; rules, borders, shadows, and shading; ornaments and fleurons).
- Script samples. Examples of natural written languages presented in a form visible to a sighted user. May consist of an alphabetic or otherwise ordered presentation of components (e.g., the letters of the alphabet in upper and lower case; initial, medial, and terminal forms, characters according to number of strokes or radicals). May consist of test or example words, phrases, renderings, or utterances in the natural language. May include examples of nonlinguistic symbols (e.g., pi characters).
- Gestural-language samples. Examples of natural sign languages presented in a form visible to a sighted user. May use full-motion video, animation, or any combination of those or other methods of presentation. Use of still images requires appropriate text equivalents.
- Speech samples. Examples of natural spoken languages presented in a form audible to a hearing user.
- Phonetic notation. Transcription or transliteration of natural spoken language in a written code. May use International Phonetic Alphabet, pinyin, romaji, nikud, or other systems.
- Programming languages and machine instructions. Examples of symbols, commands, markup, or instructions for processing by computers or machinery.
- Multimedia. Examples of multimedia as multimedia, with no intent to communicate the content of the multimedia. May include audio, video, motion graphics, animation, or other manners of presentation in any format.
- PDFs. Examples of PDF itself, with no intent to communicate the content of the PDFs.
- Music notation. Use of symbols to record or transliterate music or to provide instructions for playback. May include natural-language lyrics or other utterances. Incidental use of natural-language instructions (e.g., fortissimo, arpeggio, mezza voce) need not invalidate the PDF as an example of music notation. Authors may language-code such incidental instructions, but user agents must accommodate errors in or omissions of such coding.
- Movement notation. Use of symbols to record or transliterate movement (of humans, animals, insects, machines, or other) or to provide instructions for reproduction of movement. Includes notation of natural sign languages.
- Mathematics. Use of notation for algebra, calculus, or other fields of mathematics. A PDF consisting of mathematical notation with no natural-language text MUST use MathML to encode the mathematics.
- Scientific notation. Use of symbols to express formulæ, to map structures, or to communicate other meaning in fields such as chemistry and physics but excluding mathematics.
Braille
PDFs presented as encoded Braille, as via the use of Unicode Braille Patterns (Range 2800–28FF), are considered nonlinguistic if and only if presented as samples of Braille itself rather than as usage of Braille to express natural language. Font characters for Braille MUST be included in the PDF.
Problems and counterexamples
- Programming languages: If you’re showing an example of markup, you may be marking up natural-language text.
- Quizzes and tests: You can provide a test or quiz (e.g., spelling, dictée), or provide an incorrect sample (including a PDF sample) for a student to correct, and still meet our guidelines.
- Visible Braille patterns (like the actual Unicode specification pages) are script samples, not Braille.
- Still pictures for sign languages: Still the predominant method of communicating sign-language lexicon. A PDF of nothing but such images presents a conundrum of definitions.
- Music: I pretty much am indeed saying that user agents must be able to recognize encoded music notation and handle musical terminology like fortissimo, arpeggio, and mezza voce by themselves.
- Multimedia: The idea here is that the PDF contains nothing but a multimedia object, except maybe it also has a title or something. In that case, no you don’t have to caption it or provide a text equivalent because the content of the multimedia is not the point; in fact, the content can be completely ignored or misunderstood and the intent of the PDF will still succeed. (If you want to use PDF to show an example of Flash 15 in the year 2012, and the only example you’ve got is in Japanese, you the English-language author can still succeed in marking up the PDF according to our spec.)
- Math: There are some fields of mathematics, like non-commutative geometry, that MathML cannot render. You’ve pretty much got to use pictures. I have no solution for this, except to say that the likelihood of a PDF of nothing but a picture of a non-commutative diagram is low.
Top of 14289 Drafting