WELCOME [ Log In · Register ]        SITE [ Search · Page Index · Recent Changes ]    RSS

32000-2 requests: Beijing

32000-2 requests: Beijing

PDF/UA's requests for ISO 32000-2 are now separated into multiple groups of requests, to wit:


Abbr

Request Title 

Add an Abbr tag to Table 338
Request Submitted By  PDF/UA
Executive Summary Provides semantics for abbreviations, acronyms, initialisms, and short forms, which may be expanded using an E key. In PDF/UA, some Abbr content must be so expanded.
Rationale

Abbreviations, acronyms, initialisms, and short forms can be confusing or unreadable to users of some adaptive technologies like screen readers and to some readers with learning disabilities. The Abbr tag provides semantic markup to disambiguate these terms. PDF/UA requires ambiguous Abbr content to be expanded using an E key; see PDF/UA's Text module for details.

Provides compatibilitly with HTML 4.x and XHTML 1.x and 2.x (PDF collapses ABBR and ACRONYM onto ABBR) and HTML5, XHTML5, and XHTML 2.0.

Use Case(s) While the tag is named Abbr, it encompasses any and all variations of abbreviations, acronyms, initialisms, and short forms.
Details of Requested Change 

Add a new Structure Type in Table 338 as follows:

 Structure Type  Description
 Abbr  This element is applied to abbreviations, acronyms, initialisms, and short forms.
Return to 14289 Drafting or 32000-2 requests: Beijing

Emphasis

Request Title 

Add an Em tag to Table 338
Request Submitted By  PDF/UA
Executive Summary Represents stress emphasis of its contents.
Rationale Common in HTML, Emphasis is just as common on PDF documents, but there is presently no way to indicate emphasis to assistive-technology users who rely on PDF tags for semantic structures.
Use Case(s) Emphasized text or content.
Details of Requested Change 

Add a new Structure Type in Table 338 as follows:

 Structure Type  Description
 Em  This element is applied to text which should be emphasized.

 

Return to 14289 Drafting or 32000-2 requests: Beijing

Strong

Request Title 

Add a Strong tag to Table 338
Request Submitted By  PDF/UA
Executive Summary Represents strong importance for its contents.
Rationale Common in HTML, Strong is just as common on PDF documents, but there is presently no way to indicate emphasis to assistive-technology users who rely on PDF tags for semantic structures.
Use Case(s) Strongly-emphasized text or content, or text or content with strong importance.
Details of Requested Change 

Add a new Structure Type in Table 338 as follows:

 Structure Type  Description
 Strong  This element is applied to text which should be strongly emphasized. Stronger than the Em tag.

 

Return to 14289 Drafting or 32000-2 requests: Beijing

Subscript

Request Title 

Add a Sub tag to Table 338
Request Submitted By  PDF/UA
Executive Summary Compatibility with HTML; improved document semantics
Rationale For use only when subscripting changes meaning, not for visual presentation
Use Case(s) Subscript text
Details of Requested Change 

Add a new Structure Type in Table 338 as follows:

 Structure Type  Description
 Sub  This tag is applied to text or content that, were it not subscripted, would change meaning.

 

Return to 14289 Drafting or 32000-2 requests: Beijing

Superscript

Request Title 

Add a Sup tag to Table 338
Request Submitted By  PDF/UA
Executive Summary Compatibility with HTML; improved document semantics
Rationale For use only when superscripting changes meaning, not for visual presentation
Use Case(s) Superscript text
Details of Requested Change

Add a new Structure Type in Table 338 as follows:

 Structure Type  Description
 Sup  This tag is applied to text or content that, were it not superscripted, would change meaning.

 

Return to 14289 Drafting or 32000-2 requests: Beijing

Sidebar

Request Title 

Add a Sidebar tag to Table 338
Request Submitted By  PDF/UA
Executive Summary Improved document semantics, especially when converting legacy documents
Rationale There is presently no way to associate a sidebar with its parent article. In sequential PDF reading (as by a screen reader or Braille display), it becomes confusing for users with disabilities to associate sidebar content with main-article content
Use Case(s) A sidebar is text discussing a topic semantically related to the text inside the enclosing tag. A sidebar typically cannot be understood without first reading all or part of the text in the enclosing tag. Common in printed magazines; PDFs exported from such magazines presently have no semantics for this document content. Preferred usage: As a child of other structural elements (e.g., inside Art or Sect).
Details of Requested Change 

Add a new Structure Type in Table 338 as follows:

 Structure Type  Description
 Sidebar  This tag is applied to text or content that is related to and semantically introduced or surrounded by other text or content.

 

Return to 14289 Drafting or 32000-2 requests: Beijing

Callout

Request Title 

Add a Callout tag to Table 338
Request Submitted By  PDF/UA
Executive Summary Improved document semantics, especially when converting legacy documents
Rationale There is presently no way to associate a callout (usually a specially formatted and edited quotation) with its parent article. In sequential PDF reading (as by a screen reader or Braille display), it becomes confusing for users with disabilities to associate callouts with main-article content
Use Case(s) Quotation or excerpt from text, typically displayed in large type in a visually prominent location. Common in printed magazines; PDFs exported from such magazines presently have no semantics for this document content. Preferred usage: As a child of other structural elements (e.g., inside Art or Sect).
Details of Requested Change 

Add a new Structure Type in Table 338 as follows:

 Structure Type  Description
 Callout  This tag is applied to text or content that is extracted from other text or content and given significant emphasis (usually via visual or typographic means).

 

Return to 14289 Drafting or 32000-2 requests: Beijing

Dek

Request Title 

Add a Dek tag to Table 338
Request Submitted By  PDF/UA
Executive Summary Improved document semantics, especially when converting legacy documents from newspaper or magazine sources
Rationale There is presently no way to associate a dek with its parent heading (H1 through H6 or H) and article. In sequential PDF reading (as by a screen reader or Braille display), dek content can be confused with main-article content
Use Case(s)

A dek is a short summary of an article following a heading and preceding the article text. It is usually longer than a heading (sometimes several paragraphs in length), yet does not function as a heading or as main-article text.

Preferred usage: As a child of other structural elements (e.g., inside Art or Sect), following any heading tag and preceding full text of an article or other content.

NB:  The correct spelling is dek, not deck; the words have different spellings so that the word “dek” never appears in published copy. (Editors search for “dek” and ensure it appears solely in comments or other internal instructions.)

Details of Requested Change

Add a new Structure Type in Table 338 as follows:

 Structure Type  Description
 Dek  This tag is applied to text or content that follows a heading and introduces main-article text.  An abstract or a synopsis is an example of a dek.

 

Return to 14289 Drafting or 32000-2 requests: Beijing

Media

Request Title  Add a Media tag to Table 340
Request Submitted By  PDF/UA
Executive Summary A Media tag identifies the type of media embedded in the document to assistive-technology users.
Rationale Presently no consistent method for declaring the media type is available.
Use Case(s) Video files (MPEG, SWF); audio files (MP3, AIFF)
Detail of Proposed Changes 

Description for Table 340:

 Structure Type  Description
 Media  This element is applied to embedded multimedia objects

 

  
Return to 14289 Drafting or 32000-2 requests: Beijing

Continued List

Request Title 

Add a CL tag to Table 336, update Table 334
Request Submitted By  PDF/UA
Executive Summary A CL tag allows users to indicate a portion of a list that is part of a larger list but that is discontiguous from the preceding portion of the list.
Rationale It is common in documents to contain a list that is interrupted by explanatory text  or by a graphic. Nonetheless, the content author may intend the list to be a single entity rather than multiple individual lists. An example would be an ordered list that continues numbering from a previous list, after some text that introduces the remaining items.
Use Case(s) Ordered lists with text or shape elements separating the list into multiple sections.
Details of Requested Change

Update Table 334 to include a CL type to the "List Elements / "Structure Types" table-cell.

Add the following item to Table 336:

Structure Type  Description
 CL  A sequence of items of like meaning and importance that is part of but separated from the sequence that precedes it. Its immediate children should be an optional caption (structure type Caption; see 14.8.4.2, “Grouping Elements”) followed by one or more list items (structure type LI).

 

Return to 14289 Drafting or 32000-2 requests: Beijing

MathML

Request Title 

Add a subset of the MathML tags
Request Submitted By  PDF/UA
Executive Summary Add the Presentation MathML 2.0 tags and attributes and the MathML math and semantics tags.
Rationale MathML is an existing well-supported standard for encoding mathematical equations that would give assistive-technology users and search engines access to the formulae.
Use Case(s) Non-linear math
Details of Requested Change

Below are the steps suggested.  Two new subsections (one for MathML elements and one for their attributes) need to be added.  These both introduce tables that need to be added. The text refers to table numbers.  In the text below, they are labeled as mmm and nnn and values need to be given to them based on the numbering in the spec.

Note:  It appears that elements and attributes in the PDF spec are "CamelCased" (capital letters for each new word as in "ColSpan").  In MathML, all lowercase letters are used.  In the proposal below, all lowercase letters are used, but nothing is broken if they are switched to the PDF convention.  It would be slightly simpler for exporters to XHTML if lowercase were used, but I doubt it makes any real difference.

Here are the recommendations for what to change:

  1. Table 340 (Standard structure types for illustration elements) should have the "Formula" modified to be
    Structure Type  Description
     Formula (Formula) A mathematical formula.
    This structure type is used to identify an entire content element as a
    formula. Formula can only directly contain the math element.  See table mmm.
  2. The last paragraph of 14.8.4.5 (Illustration Elements) should be modified to read:

    "For accessibility to users with disabilities and other text extraction purposes, an illustration element should have an Alt entry or an ActualText entry (or both) in its structure element dictionary (see 14.9.3, “Alternate Descriptions,” and 14.9.4, “Replacement Text”). The Formula structure element should contain MathML structure elements corresponding to the subexpressions for maximum accessibility. Alt is a description of the illustration, whereas ActualText gives the exact text equivalent of a graphical illustration that has the appearance of text."
  3. A new subsection of 14.8.4.5 (Illustration Elements) should be added at the end of 14.8.4.5.  The text of that section is shown below.

  4. The following row should be added to the bottom of table 341:
    Owner  Description
     MathML-2.0 Attributes associated with MathML structure elements (Table mmm).

    MathML   Attributes governing the layout of mathematical expressions.
  5. A new section should be added after 14.8.5.7 (Table Attributes) that lists the MathML attributes in Table nnn. The text of that section is shown below

New subsection of 14.8.4.5 (Illustration Elements)

The structure types described in table mmm, "MathML Elements", are used to describe formulas. The math element should only appear inside of a Formula element, and all of the other elements in the table should only appear inside of other MathML elements.

Note:  Strictly speaking, the elements listed in Table mmm are neither BLSEs or ILSEs.

The elements in Table mmm are specified in detail in the MathML 2.0 recommendation and are summarized below. With the exception of the math, semantics, annotation, and annotation-xml elements, these elements are part of chapter 3 (Presentation Markup) of the MathML 2.0 recommendation.

Table mmm  Standard math attributes

Structure TypeDescription
math The root element of the MathML.
mi Leaf element whose content is an identifier.
mn Leaf element whose content is a number.
mo Leaf element whose content is an operator.
mtext Leaf element whose content is arbitrary text.
mspace Leaf element whose content is a space whose width and height are given by attributes.  If this element is used, a physical whitespace character should be in the document.
ms Leaf element whose content is a string.  The delimiters of the string are given by the lquote and rqoute attributes.  The delimiters should not be part of the content of the element.
mglyph Leaf element whose content is an identifier.
mrow Group any number of horizontally laid out elements together.
mfrac A vertical or beveled fraction with exactly two children.
msqrt A square root with exactly one child.
mroot A radical with exactly two children:  the index and the radicand.
mstyle Change the style of how the child is displayed by changing attributes that are inherited by the children.
merror Enclose a syntax error message from a preprocessor or otherwise indicate an error.
mpadded Adjust the vertical or horizontal space around the child.
mphantom Make the child invisible but preserve its size.  If this element is used, either a white space character must be used or what is drawn should match the background so that it is invisible.  It should not be spoken.
mfenced Surround the children with "fences" (e.g., parenthesis) and add separators as specified by the open, close, and separators notations.  Neither the fences nor the separator(s) should be children of this element.
menclose Enclose the children with lines, circles, cross-outs, or other decorations as specified by the notation attribute.
msub An expression with a subscript.  Both the base and the subscript are children.
msup An expression with a superscript.  Both the base and the superscript are children.
msubsup An expression with a subscript and a superscript.  The base, subscript and superscript are children.
munder An expression with underscript or lower limit.
mover An expression with overscript or upper limit.
munderover An expression with both an underscript/lower limit and an overscript/upper limit.
mmultiscripts An expression with prescripts (sub/superscripts to the left of the base) or tensor indices.  The base of the multiscript should be the first child, followed by pairs of lower and upper indices (subscripts/superscripts).  Missing scripts are indicated using the none elements.  Pairs of prescripts follow the postscripts and must be preceded by a mprescripts element.
none Valid as a child of mmultiscripts.  none is used to indicate a unused subscript or superscript as part of a subscript/superscript pair.  In MathML, this is an empty element but because of the requirements for structure elements in PDF, it must point to some content.  It is recommended that applications insert a whitespace character in the empty position so this element can refer to some content.
mprescripts Valid as a child of mmultiscripts.  mprescripts is used to indicate the start of prescript subscript/superscript pairs.  In MathML, this is an empty element but because of the requirements for structure elements in PDF, it must point to some content.  It is recommended that applications insert a whitespace character immediately before or after the notation so this element can point to some content.
mtable A matrix or other tabular mathematical layout.  MathML tables are similar to HTML tables and consist of one or more table rows (mtr or mlabeledtr).  Unlike PDF tables, MathML tables have no headers or captions because headers are not mathematical expressions.
mtr A row in a mtable.  Its parent must be mtable.
mlabeledtr A row in a table that has a label on either the left or right side, as determined by the side attribute. The label is the first child of mlabeledtr The rest of the children represent the contents of the row and are identical to those used for mtr; all of the children except the first must be mtd elements.  Like mtr, its parent must be mtable.
mtd One entry, or cell, in a table or matrix. An mtd element is only allowed as a direct child of an mtr or an mlabeledtr element.
maligngroup An alignment marker that is used to help vertically align specified points within a column of MathML expressions.  maligngroup is a space-like leaf element that is used divide a column up into groups; see the MathML recommendation for more details.  In MathML, this is an empty element.  It is recommended that applications insert a whitespace character that corresponds to the maligngroup element.
malignmark An alignment marker that is used to help vertically align specified points within a column of MathML expressions.  It specifies a specific alignment point within a maligngroup.  Like maligngroup, it is a space-like leaf element; see the MathML recommendation for more details.  In MathML, this is an empty element.  It is recommended that applications insert a whitespace character that corresponds to the malignmark element.
maction  In MathML, maction is used to to bind actions to expressions.  In a PDF document, no action is associated with this element although a plug-in could be written to enliven the expression.  maction is mainly provided for compatibility.  maction takes an arbitrary number of children, although only one child is displayed.  Children that are not rendered can not be part of the structure tree and their representation in PDF is currently not specified
semantics Associates a specific notation with a notation-independent representation that carries more semantic information.  For PDF, the first child must  be the MathML for the notation being displayed.  In MathML, subsequent elements (annotation, annotation-xml) specify alternative encodings and are not rendered.  Children that are not rendered can not be part of the structure tree and their representation in PDF is currently not specified
annotation A child element of semantics whose child provides an alternative non-XML representation of the contents of the semantics element. This element cannot currently be part of the structure tree.  For more information, see semantics.
annotation-xml A child element of semantics whose child provides an alternative XML-based representation of the contents of the semantics element. This element cannot currently be part of the structure tree.  For more information, see semantics.

Note:  MathML contains a set of "content" elements that are notation-independent semantic elements. These elements do not have a specific layout and so are not meaningful in PDF unless they are inside of a semantics element.


New Section after 14.8.5.7 (Table Attributes)

The attribute owner "MathML-2.0" shall be associated with the attributes listed below.

Table nnn lists all of the attributes associated with the MathML elements listed in Table mmm. The description of each attribute is given in MathML 2.0 recommendation. They are listed below for completeness.

Unless otherwise noted:

  • these attributes are not inherited (i.e., they apply only to the current element of the structure tree, not the children of the element);
  • these attributes are optional;
  • the "type" for each value is "string";
  • "class", "id", "style", "xref", and "xlink:href" are legal attributes for all MathML elements.

The attributes exist for compatibility with MathML generation tools and to allow translation of the math to an XML dialect.

Table nnn  Standard math attributes

Structure ElementsAttributes
math display, altimg, alttext
mi, mn, mtext mathvariant, mathsize, mathcolor, mathbackground
mo mathvariant, mathsize, mathcolor, mathbackground, form, fence, separator, lspace, rspace, stretchy, symmetric, maxsize, minsize, largeop, movablelimits, accent
mspace mathvariant, mathsize, mathcolor, mathbackground, width, height, depth, linebreak
ms mathvariant, mathsize, mathcolor, mathbackground, lquote, rquote
mglyph mathvariant, mathsize, mathcolor, mathbackground, alt (required), fontfamily (required), index (required)
mfrac linethickness, numalign, denomalign, bevelled
mstyle All optional attributes listed for the tags.  In addition, the following are valid:  scriptlevel, displaystyle, scriptsizemultiplier, scriptminsize, background, veryverythinmathspace, verythinmathspace, thinmathspace, mediummathspace, thickmathspace, verythickmathspace, veryverythickmathspace.

All values are inherited.

mpadded width, lspace, height, depth
mfenced open, close, separators
menclose notation
msub subscriptshift
msup superscriptshift
msubsup subscriptshift, superscriptshift
munder accentunder
mover accent
munderover accentunder, accent
mmultiscripts subscriptshift, superscriptshift
mtable align, rowalign, columnalign, groupalign, alignmentscope, columnwidth, width, rowspacing, columnspacing, rowlines, columnlines, frame, framespacing, equalrows, equalcolumns, displaystyle, side, minlabelspacing
mtr, mlabeledtr rowalign, columnalign, groupalign
mtd rowspan, columnspan, rowalign, columnalign, groupalign
maligngroup groupalign
malignmark edge
maction actiontype (required), selection
semantics,  annotation, annotation-xml  definitionURL, encoding
 

 

 

Return to 14289 Drafting or 32000-2 requests: Beijing

Scope and Header attributes of tables

PDF/UA has also proposed this item as an [Application Note to ISO 32000-1]

Request Title 

Modify definition of Scope, Headers, and ID attributes of Tables
Request Submitted By  PDF/UA
Executive Summary Specify an algorithm for associating header cells in a table with data cells in a table.  Additionally, clarify the specification of Scope, Headers, and ID attributes so that header lookup through IDs is well-defined. This modifies the description of these attributes given in Table 335 (page 582) and Table 347 (see page 606). No new tags or attributes are requested.
Rationale The existing description for tables lacks a precise definition of how headers are associated with table cells.  One is needed so that authors and AT agree on which header is associated with which cell, especially for non-trivial tables.
Use Case(s) AT needs to know how to find the row and column headers, both when IDs and Headers are given and when they are not.
Details of Proposed Change

The Note to Table 337 says "Lookup is heuristic".  This will lead to incompatible behavior by AT. No algorithm is given in ISO 32000 when header cell IDs and table data cell IDs are not present.

In the case that header data cell IDs and table data cell IDs are specified, PDF/UA specifies an algorithm to associate table header cell(s) with table data cell(s). This algorithm is flawed. The recursive lookup mentioned in Table 349 (Headers) is ambiguous in that the headers might be only associated with a row, a column, or both. The following change to Table 349 requires that a recursion be explicit.  In particular:

  • HEADER lookup through IDs is not recursive
  • ID order is specified 

The PDF/UA specification provides an algorithm for both of the cases above so that all AT will associate the same table header cells and table data cells. The new descriptions and their rationale are described in PDF/UA Tables specification.

Return to 14289 Drafting or 32000-2 requests: Beijing
 

Language codes

Request Title 

Update language codes

Request Submitted By 

PDF/UA
Executive Summary  
Rationale

Language codes shall now be derived from BCP 47, “Tags for Identifying Languages,” which supersedes the standards listed in PDF 1.7.

Use Case(s)

 All content with language codes.

Details of Proposed Change

ISO is no longer the source of the standard for language codes. BCP 47 is an alias to the current RFC 4646/4647, which contains all language codes previously listed in ISO 639-1, -2, and -3 (minor errors corrected) and new language codes.

Change all references to ISO 639-x or RFC 4646/4647 to BCP 47. As BCP 47 grandfathers all previous language codes, there is no need to make a different provision for PDF documents created before ISO 32000 was enacted.

Return to 14289 Drafting or 32000-2 requests: Beijing

Documents with unknown language

Request Title 

Specify language codes of content in unknown language(s).

Request Submitted By 

PDF/UA
Executive Summary Establish compatibility with BCP 47 to enable authors to mark up content expressed in a language unknown to the author or creator. Language code und shall be used for documents whose language is unknown to the author or creator.
Rationale Brings the Reference up to date.
Use Case(s) An author or creator may be presented with a document expressed in a language that, despite best efforts, cannot be identified. Language coding cannot be omitted from such documents, as adaptive technology might make a guess, usually an incorrect one, as to the language of the content and read it with nonsensical speech or Braille. BCP 47 provides language code und for such content.
Details of Proposed Change

Update the Lang key in Table 124. Remove “If this entry is absent, the language shall be considered to be unknown.” Replace with “If this entry is absent, such absence provides no information as to the language of the document; in particular, an absence of this entry does not by itself mean the language of the document is unknown.”

Return to 14289 Drafting or 32000-2 requests: Beijing

Request Title 

Implement nonlingustic content language code
Request Submitted By  PDF/UA
Executive Summary Specify language codes for content that does not use a natural language.
Rationale PDF content does not have to use a natural (human) language. That content may nonetheless be made up of Unicode characters. Without correct language coding, adaptive technology might make a guess, usually an incorrect one, as to the language of the content and read it with nonsensical speech or Braille. BCP 47 provides language code zxx for such content.
Use Case(s)

Non-linguistic content.

Details of Requested Change 

Add a new paragraph at the end of 14.9.2.2 as follows:

Non-linguistic content should be marked with language code "zxx" (See ISO 639-2). 

Declaration of Natural Text

Request Title 

Declaration of Natural Language
Request Submitted By  PDF/UA
Executive Summary Tagged PDFs shall declare a language.
Rationale

PDF/UA - Universal Accessibility sets up two categories of PDF: Untagged, where we make no requirements about language or writing direction, and tagged, where we require notation of language and writing direction. By implication, an author who cares enough to publish a tagged PDF must take adequate care to ensure its language content is understandable.

Use Case(s) See Rationale
Details of Proposed Changes

In §14.8.1 (Tagged PDF: General), add the following bulleted item after the entire subhead entitled “A tagged PDF document shall conform to the following rules”:

A tagged PDF document shall declare its natural language (if any) and writing direction as per §[TK].

...where "TK" is the location in the document of the new text attributes: WritingModeInline and WritingModeBlock. (See PDF/UA's Text Direction request).

Return to 14289 Drafting or 32000-2 requests: Beijing