Is that really the source code for this software?

2013-06-19

I’ve been looking into how easy it is to confirm that a binary package corresponds to a source package. It turns out that it is not easy at all. So I’ve written down my findings in this blog entry.

I think that the topic of reproducible builds is one that is of fundamental importance to the free software and larger community; the trustworthiness of binaries based on source code is a topic quite neglected. We know about tivoization and the reality that code can be open yet unchangeable. What is not appreciated in sufficient measure is that parties can, quite unchecked, distribute binaries that do not correspond to the alleged source code.

Trust is good, but especially in a post-Snowden world, control is better. Can a person rely on binaries or should we all compile from source? I hope to raise awareness about the need for a reproducible way to create binaries from source code.

Free software means users have the four essential freedoms. Freedom 1 is the freedom to study how the program works and change it so it does your computing as you wish. It also means that the program does not do you what you do not want it to do. Instead of having to trust the supplier of the software, you can check that the software works as advertised and does not contain e.g. spyware. Access to the source code is a precondition for this freedom.

Many software packages are distributed in binary form and come with a license that makes the right to the source code explicit. For example the GNU GPL v2.0 says:

For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable.

A license that promises access to the source code is one thing, but an interesting question is: is the published source code the same source code that was used to create the executable? The straightforward way to find this out is to compile the code and check that the result is the same. Unfortunately, the result of compiling the source code depends on many things besides the source code and build scripts such as which compiler was used. No free software license requires that this information is made available and so it would seem that it is a challenge to confirm if the given source code corresponds to the executable.

GNU/Linux distributions

Collecting software packages that form a working operating system is one of the services of a distribution. Another service that most provide is compiling that software into executables and shipping those in convenient packages. Most distributions ship two types of packages: source packages and binary packages. A distribution is a complete system that includes all the tools to compile source code. Those tools go beyond the tools that are used in the build scripts from the upstream developer. Distributions contain tools to create binary packages from source packages. Does this mean that it is less of a challenge to confirm if the source code corresponds to the executable?

Doing the test

I have built a binary package from a source package for a number of distributions (Debian, Fedora, and OpenSUSE) and compared the self-built binary package with the one published by the distribution. All tests were run on fresh, minimal installs of the latest version of each distribution using the tools that are recommended by the distributions. To keep the complexity low, one simple package was chosen: tar. Will the self-built package be exactly the same, totally different or only slightly different?

Debian

Debian was installed from a downloaded netinstall image: debian-7.0.0-amd64-netinst.iso. The system was installed on a VirtualBox machine. The version of tar that comes with Debian is 1.26+dfsg-0.1. According to the instructions compiling the tar package from source is as simple as running:

noted also that osc is the tool of choice and even claimed to have gotten identical builds. I have not yet verified that, but it sounds great. The mentioned package, build-compare, has some scripts that are meant to compare package whilst ignoring variable parts of the build.

yeap

It's true that osc (the Open Build Service command line client, also available for most other linux distributions) is the most-used and standard build tool for RPM's on openSUSE. It's not mandatory or anything, of course... It builds using a chroot, I believe, so it does indeed probably lead to very close or identical results. It is of course limited to building for the architecture of the system it is on, OBS does not have that problem as it uses clean VM's each time it builds. That makes builds even more reliable and easy to perfectly reproduce.