Centimeters and Points are the best

2015-08-24

What is the best unit? Of course there is no best unit. But for some purposes some units are better than others.

In digital documents, there is often a choice of units with which to specify absolute lengths. CSS, SVG and ODF have a choice of inches (in), centimeters (cm), millimeters (mm), picas (pc), points (pt) and pixels (px). Editing files with different computer programs or different versions of programs can lead to mixed use of units.

For example, when saving ODF with LibreOffice, the unit that is used for storage depends on the user preferences. This can lead to inconveniences and rounding errors. If I specify a margin of 1.25cm and send it to someone who has the preferences set to use inches, the margin will be stored as 0.4925in. When that number is converted back to centimeters, the value is 1.25095cm which is 1‰ more than the original value.

In addition to the loss of precision, consider the user interface. That shows a weird value now: 1.25095cm. The lengths in office documents are usually not measurements; they are nice numbers chosen by users. Most people choose lengths to be round numbers in the particular unit they are currently working with. 0.3175cm and 9pt are exactly the same length, but only one representation is likely to be the one that was chosen by a user.

These conversion problems can be avoided by decoupling the units used in the user interface from the units used when saving.

Lossless conversion

Here is a small table with the exact multiplication factors for conversion between the length units:

in

cm

mm

pc

pt

px

in

1

2.54

25.4

12

72

96

cm

50/127

1

10

600/127

3600/127

4800/127

mm

5/127

0.1

1

60/127

360/127

480/127

pc

0.25/3

0.635/3

6.35/3

1

6

8

pt

0.125/9

0.3175/9

3.175/9

0.5/3

1

4/3

px

0.03125/3

0.079375/3

0.79375/3

0.125

0.75

1

The green multiplication factors give lossless conversion. The red multiplication factors give lossy conversion for decimals. Let’s look at the conversion between inches and centimeters again. Conversion from inches to centimeters is lossless. The conversion with 2.54 will add two extra digits, but the resulting value can be written in decimal representation and is always exactly the same length.

Conversion in the other direction, from centimeters to inches, is problematic. The centimeter value is divided by 127, which leads to an infinite sequence of digits for most values.

The column for conversion to inches is almost entirely red: writing a length out in inches nearly always leads to a lossy conversion.

There is no single unit that can save any length losslessly. But a combination of cm and pt does cover all cases. Any length can be written out without loss of precision by using either cm or pt.

Normalization

Recently, I’ve developed an urge to normalize ODF files. ODF, expecially in the flat file format (a single XML file), is convenient for writing texts that are stored in version control systems. It’s like plain text but with added features like bold, italic, named styles, tables and mathmatical formulas. ODF is also the best file format you can hope to get from a non-programmer.

Programmers like to put files in revision control systems and look at diffs between different versions of a file. This is where normalization is needed. Office applications are really complex and each one has its own peculiarities in saving files. These peculiarities have no influence on the semantics of the documents but can make a diff between two versions larger than needed.

The solution is to remove the peculiarities. This process is called normalization. Normalization is hard to perfect, but even imperfect normalzation is helpful.

One step in normalization is to standardize on the unit that is used for lengths. The conversion should be lossless. The unit that is used carries no meaning in ODF, but the length itself should be transferred exactly. For normalizing lengths in ODF I’m currently using the following logic: if a length can be converted losslessly to centimeters, do so, if not, then use points. In this way, two documents with different origins will have stored equal lengths as equal character sequences.

11th ODF Plugfest

Comments

Post a comment