Ideas for OpenDocument

At the LibreOffice conference in Aarhus, Denmark, last September, I talked about the future of ODF (slides). OpenDocument, the file format for office applications, is currently at version 1.2. What should go into version 1.3 and later? I presented nine ideas and measured the applause that each idea got. The results of the measurement can be seen in the table at the bottom of this post. (The applause volume was read by playing the audio of the presentation in Audacity.)

OpenDocument exists to make life easier for computer users and software developers. A well documented file format leads to a more diverse market: if the file format is not secret or changing constantly, everyone can write software for it.

The standard is not created in a vacuum. Good ideas can come from anyone and users of ODF software have a say in what improvements are made to the standard. The ideas in this document are just a few possible improvements to the standard and the ecosystem. The ODF Technical Committee is always open for your ideas.

Testing and certification

ODF 1.2 is a very complete office file format. There is no software that supports all ODF 1.2 features. Some parts are handled differently in different packages. An overview of how well each package supports which features is missing. The ODF plugfest is a reccurring event where ODF software is tested. This event is held at a physical location. To test all parts of ODF, a lot more testing is needed. That is why there is a need for a continuous ODF plugfest online: a website where people can test all office suites. People would be able to upload files and see how they render in the different packages. The site would offer a way to define tests that say unambiguously if a feature is supported well. The site could feature a live score for each ODF package.

At the last plugfest a rough design for ODF plugfest online was made. Currently, that design is being worked out.

Profiles

Writing valid ODF software is very easy: the specification says:

An OpenDocument producer is a program that creates at least one conforming OpenDocument document […]

So it can be argued that a copy command is an OpenDocument producer. Making an ODF reader seems harder:

An OpenDocument consumer is a program that can parse and interpret OpenDocument documents according to the semantics defined by this specification […]

however:

it need not interpret the semantics of all elements, attributes and attribute values.

There is currently no straightforward way to define which parts of the specification are supported by a particular implementation. One could list all supported elements and attributes.

An idea for current and future versions of ODF is to define profiles. Each profile contains a list of features. ODF software can then comply to one or more profiles.

Scripting language

Up until version 1.2 of ODF, the syntax and semantics of formulas in ODF was not specified. Formulas were implemented in the http://openoffice.org/2004/calc namespace:

table:formula="oooc:=SUM([.B5:.B12])"

table:formula="of:=SUM([.B5:.B12])"

Automated office documents require scripting or macros. ODF documents can contain macros, but the programming language for the macros has not been specified. The points where macros access the document are specified, but no language is chosen. In practice, macros created in one package do not work in another package.

Webpages can, in theory, also use different programming languages. But in practice, only JavaScript is supported across browsers. In office suites no interoperable choice is available. Requiring support for one particular programming language in the specification might change this situation.

Real-time change tracking

ODF supports change tracking: a user edits a document and the situation from before and after the edit is stored in the document. Etherpad popularized the idea of working with multiple people at once on one document. WebODF was the first ODF package that supported this style of working on ODF documents. OX Documents allows turn-based editing with fine granularity. MS Office and Google Docs also support it. In all these packages, everyone has to use the same software. The edit history is not available in a standard format.

Advanced Document Collaboration subcommittee is working on creating such a standard. It is highly anticipated but a lot of work. If you are up to the challenge, please join this effort.

Upgrade / downgrade instructions

ODF 1.0, ODF 1.1, ODF 1.2 and ODF 1.2 Extended are the current versions of OpenDocument. Many ODF 1.2 documents do not contain features that are specific to ODF 1.2 and could be converted to an older version of ODF without information loss.

There is no shared software or documentation for ‘upgrading’ or ‘downgrading’ ODF documents. If there was, it would be trivial for software that supports ODF 1.1 to add support for reading (when no ODF 1.2 features are used) and writing ODF 1.2 documents. The task is not terribly hard, because nearly all features that are shared between 1.1 and 1.2 are exactly the same in both versions.

HTML storage format

The best way to send somebody an ODF file for reading is to wrap it in a PDF file. That way, you share the original file and you can be sure that the receiving party can view it. It’s a fact that nearly every tablet, phone and computer has a PDF viewer. Support for ODF is less pervasive.

Alternatively, one could wrap an ODF file in an HTML file. The HTML would be a static rendering of the document using HTML elements. The original ODF would be available via a ‘save as’ button (implemented as a link with a data URL).

If you want someone to work with you on an ODF file, that person also needs an ODF editor. WebODF shows that it is possible to create an ODF editor that takes up only a few megabytes. Adding an ODF file and the editor for it into and HTML file that one can send out, would bring ODF everywhere instantly.

Such a mix of HTML and ODF would be nice to have. But is it something for a specification or for a software project?

Normalization

Software developers rarely use ODF for writing documents. They prefer HTML, Markdown, or plain text files. One explanation is that developers like to store their work in version control systems like Git. In these systems, commits are important. A commit shows the difference between versions of a file. These difference can be shown best for text files. The text files are usually edited in plain text editors. But ODF is not a simple format and is usually edited in an office suite. Everytime the document is saved after an edit, it changes in many places. The clearest example of this are the automatic styles. The names of automatic styles are only used internally and the precise value has no meaning.

So with ODF, the number of changes that developers see in their commits is confusingly large. It is mostly noise. A solution would be to write rules to have a standard way to write things like automatic styles. Additionally, choices like how to indent the XML and how to order the attributes and elements have to be made and standardized.

When that is done, developers will warm more to using ODF. The convenience of a powerful well-structured file format can win over the obscure terseness of Markdown.

Standardize handling of invalid files

The HTML specification has a very long description of how to parse HTML. Parsing HTML is complicated because browser developers think that even HTML files with errors in them should be read. And if that’s a requirement, it’s best to standardize how to deal with errors.

Of course when invalid syntax is standardized, that syntax is now also valid. The result is a very, very complicated syntax. An alternative, simple and strict syntax also exists: XHTML. There is a high software development cost to entering the browser market because of the complicated syntax.

ODF does not have a long chapter on parsing and it should not have one. This idea is not a serious suggestion. In ODF, rules for parsing invalid files are not needed because nearly all ODF files are created with high-level software. HTML is often written in simple text editors where it is easy to make syntax mistakes. Condoning such mistakes is not a good idea for ODF.

Theme support

ODF has support for styles. The styles shown in the user interface are called ‘common styles’. Swapping out common styles for other common styles gives a document a different look. The names for the common styles are not standardized. A standardized set of common style names would make it easier to swap out style files (themes).

Within an organization, one might well choose for a common set of names for styles. Web applications like Wordpress can have themes because the components of the pages are defined by the software. In ODF, there is a set of names for the components in a presentation. Sets of names for common styles could also be added.

Applause!

The audience at the LibreOffice conference had preferences. Real-time collaborative editing was the clear winner. Theme support was the runner up and testing and certification and normalization tied for the third place.

Idea	Applause
No applause	-48
Maximum applause	-4
Testing and certification	-6
Profiles	-9
Scripting language	-9
Real-time change tracking	-3
Upgrade / downgrade instructions	-10
HTML storage format	-48
Normalization	-6
Standardize handling of invalid files	-48
Theme support	-5

This was the opinion of the room in Aarhus. I’m sure there are many different ideas and preferences. Some ideas are already being worked on and some are waiting for someone to step in and help.

watch the presentation

Comments