PDF/UA for very long documents
At least from a user experience point of view (based on currently available tools) PDF/UA, or tagged PDF in general, does not look like it is up to the task (of very long documents) yet. This might be more of a tool issues than a PDF/UA standard issue though.
The main problem is, that tagged PDF essentially define a DOM, and as that DOM is a tree stucture and tends tp be parsed and processed in one go, it might become too large of a tree structre to be processed efficiently.
The main question is whether something needs to be done in the text of PDF/UA-2 (or PDF 2.0 ? ) to (help) address this, or whether this is just an aspect tool developers willl have to deal with.
Relevance in the real world comes from the following corners:
- technical specifications can be relatively long, like ISO 32000-1 (typical size: many hundreds to a thousand or 2000 pages)
- PDF/VT (ISO 16612-2) supports / encourages creation of very long documents (where often the PDF document is actually a concatenation of many content pieces each of which in other screnarios would be its own PDF document), up to hundreds of thousands of pages; as PDF/VT is (to be) used in contexts where accessibility oes play a role, oit would be good to be able to address this
- .... ?
Noted from April 3 ad hoc Meeting:
Matthew brings up that you don't have to hold whole tagging table in memory - tags are good for random access. Current implementations may not be doing the right thing, but it is an implementation detail.
Greg brings up the point of long tables of content and ATs not processing them for users.
Matthew notes that this is a good example of what he's talking about - you don't need the whole document, only parts of the tree.
Cherie suggests adding info to the Part 1 implementers guide because it seems like an important topic.
Olaf suggests pointing out to implementers that docs can get quite large so they should plan for this ahead of time and not try to parse the whole DOM after the fact.
