Presented by

  • Timothy Sample

    Timothy Sample
    https://ngyro.com/

    Tim is a contributor to the GNU Guix project, where he helps clarify the composition of software artifacts. He has worked on the long-term archival of Guix package inputs (i.e., source code), and is currently focused on “bootstrappable” builds for core packages such as GCC and Guile. To this end he has worked on GNU Mes (a Scheme interpreter), built a shell, and written several core Unix utilities in Scheme (sed, awk, etc.). He has also written a Scheme interpreter in assembly that is capable enough to compile C code and run key Guix build scripts.

    As a long-time member of the software freedom community (cutting his teeth on Debian “sarge”), Tim is fluent in it as both a body of work and a political strategy. While offline, he enjoys spending time with his family, baking bread, and playing traditional fiddle music.

Abstract

We expect to have the freedom to study how a program works. To that end, we expect that a program’s source code be available. However, that source code only becomes a “program” with the help of another program (a compiler or interpreter). Many of the details of how a program works live in this second program; perhaps our study warrants a look at its source code, too. Now we have fallen into a loop: we can keep wondering about the source code to the program that made the program that made the program…. For most modern software, the exact chain of programs making programs is lost to history. Consider a modern copy of GCC distributed by Debian – is it possible to obtain the source code to every C compiler used to compile it and all its ancestors? Many of them will be earlier versions of GCC distributed by earlier versions of Debian, but eventually we will reach a time before Debian and even a time before GCC. Which compilers were used then, and is the source code available?

Ken Thompson warned about the security implications of this in his Turing Award lecture “Reflections on Trusting Trust”. In that lecture he explains how a backdoor can persist in a self-hosting compiler despite there being no indication of it in the source code. The correspondence between the source code and the program is not complete, and the details of how a program works can be obscured. To avoid this, we need to make sure that we can create the programs that make programs without relying on the entire history of compilers back to the dawn of computing! This is the aim of the Bootstrappable Builds project.

GNU Guix is a package manager that provides thousands of packages for GNU/Linux systems. Over the past 10 years, the Guix community has been working to make these packages bootstrappable. For instance, the chain of C compilers working back from modern GCC is known precisely: GCC 4, GCC 2, TCC, MesCC, and M2-Planet. That last one is written in assembly which is assembled by a chain of assemblers until we hit a small (around 256 bytes) program that can be understood even without its source code. There is no need to hope we keep finding archival tapes of old versions of Unix: the entire chain is right there!

As part of my ongoing work with Guix, I’ve built Germ: an implementation of the Scheme programming language in assembly. Scheme is among the best languages for building powerful abstractions on top of a small set of primitives, and Germ uses this property to be the seed from which capable build tools can grow. In this talk I will contextualize Germ within the history and future of bootstrappable builds in Guix. We will look at how it can be used to further shorten the chain of programs building programs and thereby remove needless obscurity from our quest to understand how a program works.