Musings on Operating Systems Research

We Need a Renaissance in Operating Systems Research

August 24, 2021

Musings

Authors:

Article shepherded by:

Rik Farrow

Rather than burying the lineup after a heap of verbiage, I thought it made more sense to put the lineup right up front. Then I will get busy with musing about research as well as learning about operating systems.

Cory Lueninghoener, working with a vast cast of past LISA chairs, steering committee members, and USENIX staff, put together a history of the LISA conference. With pictures!
Laura Nolan reviews Marianne Bellotti's recently published book Kill It With Fire, that addresses modernizing legacy systems with an unusual twist.
Jacob Scott discusses some of the 'failure modes' that can happen when Service Level Objectives (SLOs) are imposed in a top-down fashion, in response to an earlier article by Laura Nolan.
Thomas Depierre, also responding to Laura Nolan's article about SLOs, argues that SRE can bridge the gap between high-level metrics that management requires and the contextual service-specific knowledge that engineering teams have.
Rik Farrow interviews Vasily Tarasov, who explains his journey from Russia and Linux kernel work, to Stony Brook where he changed his focus to file systems and storage, and to working for IBM.
Hugo Lefeuvre, Gaulthier Gain, Daniel Dinca, Alexander Jung, Simon Kuenzer, Vlad Bădoiu, Răzvan Deaconescu , Laurent Mathy, Costin Raiciu, Pierre Olivier, Felipe Huici write about the Unikraft project. Sponsered by the Linux Foundation, Unikraft is toolset for building mostly-Linux API compatible unikernels, suitable for running as virtual machines and featuring faster startup and performance and stronger security.
Rik Farrow reviews Gabriel Gambetta's Computer Graphics from Scratch, a book that really explain how modern graphics programming works starting with the simplest possible function that writes one pixel in color.
Ghada Almashaqbeh revisits old ideas about peer-assisted models for resource trading, the author investigates the use of cryptocurrencies for building decentralized services.

The Trouble with Operating Systems Research

Timothy Roscoe, a professor at ETH Zurich and no stranger to ;login:'s pages, gave the last of three keynotes during OSDI and ATC 2021. You can watch Roscoe's performance here. Roscoe is lively and humorous, not something easy to do when the topic is operating systems research.

I really liked his points, starting with two definitions of what an operating system is. I'm going with his second one: that body of software that multiplexes the machine's hardware resources, abstracts the hardware platform, and protects software principals from each other. That's very basic, but I do think he misses one important thing: the OS not only abstracts the hardware, but also presents a familar application programming interface (API) to users of the OS.

That brings us to the next point, that Linux has taken over OS research. Roscoe points out that it is very difficult to get OS research published, and if you want to get a research paper past program committee members you better build on Linux. He then provides examples, using all three OS papers at two years of Operating Systems Design and Implementation (OSDI). That's right, three papers on OS design, all based on improvements to Linux. In OSDI'21, a quarter of the conference was devoted to machine learning and only six percent (two papers) to OS design.

Roscoe goes on to explain that the type of hardware that Linux has been designed to run on doesn't actually exist. Linux, after all, was designed to work a lot like UNIX, and UNIX was built to run on a PDP 6. Roscoe provides a handful of current system architectures as examples of what systems really look like, and they are not at all like the model that Linux supposedly runs on. What we have instead are systems with multiple processors providing different system services, all tied together with a communications network. Some of these processors actually control the processor cores that are running Linux, and in that sense, Linux is not even in charge of the system.

ETH Zurich uses an NXP system-on-chip (SoC) for their operating system class, one with a plethora of different processors on board. Just looking at the chip layout, I found it hard to imagine where to even start writing an OS. But the server world provides a clue: the board management controller (BMC) handles testing, booting, and maintenance tasks on server board, and BMCs currently run Linux. Not that you usually had any access to this core running Linux as a Linux system, but it is key to booting your servers.

Roscoe wants to see more operating systems research, and points out that there are businesses who need people who can write operating systems. He also points out various obstacles to learning about operating systems as well as get OS papers published. One of the biggest is the existence of a well-known, popular, open source operating system—Linux. But he is not the first person to complain about such a thing. Back in 1991, during a talk at MIT by Ken Thompson, co-creator of UNIX, a professor said, "I hate you. UNIX stopped all research in operating systems." Just replace "UNIX" with "Linux", and thirty years later we've gotten to about the same place.

Roscoe does a much better job than I am at calling for OS research and providing reasons for doing so than I can. I'm happy he stuck his neck out to make these points, and I wanted to both thank him as well as draw attention to what he has done. But I also wanted to throw my own thoughts into the ring.

Linux, like Windows, is a one-size-fits-all operating system, something that runs on Rasberry Pi's and supercomputers (noting that Windows doesn't have the same span, but the same OS kernel runs on laptops and servers). Running a complex OS on an IoT device makes no sense at all, but it is often the easiest thing to do when one is familiar with that OS. Sort of like using a jackhammer for tacking up a photograph, because you are unfamiliar with the use of a tack hammer. Linux or Windows do provide the programmer with familiar APIs, and that's the reason they get used: not because they are the best fit, but because they are familiar.

Perhaps what we need instead of today's monolithic, monstrous, operating systems is some basic scaffolding, something that students, researchers, and programmers can use to actually build a better fitting operating system upon. I've often written about microkernels as one design space, and Roscoe even mentions using seL4 to replace Linux in the BMC. That's a great idea, but I think it's just a start.

The most basic scaffolding I can imagine is a communications protocol that allows different parts of future operating systems to communicate across the system. These parts ideally share data and commands via the messages, instead of existing in one big, very dangerous, memory address space. Memory bandwidth is a serious bottleneck in current designs, and perhaps replacing this with memory dedicated to each processor and message busses is going to be worse. But it is another way to think about organizing both architecture and operating system designs.

After all, we do need some place to start from. Right now, supercomputers work like this, with each processing unit having simple network communication, some memory to work in, and an 'OS' that consists of a program that receives the instructions to execute, data to work on, starts the instructions, then sends back the results. No 350 system call API and device drivers for just about any IO device ever made — just the basics.

Or consider Cerebras Wafer-Scale Engines. 850,000 cores, each with its own memory and able to communicate between the cores, all on one chip. Perhaps Linux provides a front end to this, but each core just runs software, not an OS. Yet an OS is needed to organize the Cerebras system, and that OS is not going to be Linux.

You might also consider Unikraft, a unikernel kit that allows running a single application without an operating system — everything needed to support the application is part of the unikernel, and the unikernel runs on top of a VMM. This approach, described in this ;login: article, supplies just enough of a Linux API to support common applications, and relies on the underlying VMM to handle the hardware interface. I don't think of Unikraft as the future of operating systems, but rather an example of how things can be done differently, of thinking outside the Linux box.

As a post-graduate, I took an operating systems class. The lab used PDP 11/45s, with a system architecture that did look a lot like the PDP 6. I thought I was supposed to write an operating system that semester, and was flumoxxed to see other students in the lab with two boxes of punchcards (yes punchcards!) indicating that they already had close to 4000 lines of working code. I didn't know that they had been working on the immense project of creating a very simple OS, something akin to CP/M, over multiple semesters. For comparison, the first release of Linux was a little over 10K lines of code, but is over 25.5 million lines of code today.

Today, I find it hard to imagine even learning how Linux works in a single semester, or writing an operating system that does as much as we were asked to do on the PDP 11/45: write device drivers for the terminal and disk controller, write a file system, and execute code loaded from that file system. The PDP 11/45, as I remember it, was a model of simplicity. For example, the disk controller included DMA support and all the programmer needed to do is to provide a memory address, a physical disk address (cylinder, sector, head triplet), and a command. The terminal interface was an interrupt handler, just a few lines of code. Still, for someone whose interface to a computer had been handing a card deck to an operator, this was still immensely confusing. I wonder how well the ETH OS students do with the NXP SoC? I think a version of my lab today would mean using a network interface instead of a disk device, and the NXP SoC does provide a network interface.

To summarize, I loved Roscoe's talk, and I really do want computer scientists who understand operating systems, can research and publish papers about operating systems other than Linux, and work on future systems, including those as simple as one core ARM SoCs up to the Cerebras' hundreds of thousands of cores on a chip. I hope people much smarter than I am take up the call issued by Timothy Roscoe and forge ahead.

Article Categories:

Operating Systems

Last updated February 8, 2023

Authors:

Rik Farrow has been a consultant for 40 years. He has written two books, as well as worked as the technical editor for two editions of a popular operating system book. He also taught UNIX system administration and Internet security during the 90s, and worked as a volunteer for USENIX program and steering committees. Rik has been the editor of ;login: since 2005.

RIK@RIKFARROW.COM

Comments

Why are some of these

Permalink dtherrienexagrid

Why are some of these articles not Downloadable as PDFs ?I have a process for reviewing new publications from IEEE, ACM and USENIX - I store them all in a document manager and review new interesting articles as a group of articles.

2 years 8 months ago

While the plan was that the

Permalink Rik

While the plan was that the Web software create PDFs, that turned out to be more difficult than expected. When submitted articles start in Google Docs or LaTeX, we can create a PDF and store it along with the entering the article in the Web interface. I write using vim, so creating a PDF requires extra time and work, time that I'd prefer to spend working on something else.

2 years 8 months ago

Nitpick: PDP-7, not PDP-6.

Permalink norman

Nitpick: PDP-7, not PDP-6. Quite different systems! Reading the first part of this, I too thought back 30 years to how UNIX spoiled OS research just as Linux has today. Ironically, a talk Ken Thompson gave in 1991 very likely was about Plan 9, an attempt to break out of the UNIX mold.

2 years 3 weeks ago

Norm is right: it was a PDP-7

Permalink Rik

Norm is right: it was a PDP-7. The PDP-6 was a much larger computer, in a different DEC series. Page 33 of Brian Kernighan's "UNIX: A History and a Memoir", Brian describes how Ken Thompson needed a program for testing the disk subsystem of a PDP-7 in 1969 and realized he was "three weeks away from an operating system". Not sure where the mention of the PDP-6 came into my head as I was writing this. Both the PDP-6 and PDP-7 were in production at this time, so perhaps that's were my confusion arose.

2 years 2 weeks ago

s/I can imaging/I can image/

Permalink I70962

s/I can imaging/I can image/

1 year 7 months ago

Thanks. I fixed the typo,

Permalink Rik

Thanks. I fixed the typo, although I really did mean 'imagine'.

1 year 7 months ago