The Evolution of OS Design

Tutorial English The Evolution of OS Design

With each new generation of operating systems, we are introduced to new ways of thinking about how our computers work. To simplify things for the user, we must deploy a consistent interface in which they can do their work. It is equally important to extend this consistency to programmers, so they too can benefit. As an operating system ages, it gradually becomes burdened with a plethora of interfaces which break the simplicity of its original architecture. Unix originally followed the "everything is a file" mantra, only to lose sight of that design with numerous task-specific APIs for transferring files (FTP, HTTP, RCP, etc.), graphics (X11, svgalib), printers (lp, lpr), etc. Plan 9, introduced in 1989, demonstrated how even a GUI can be represented as a set of files, revitalizing the "everything is a file" idea. The purpose of this paper is to describe a hypothetical operating system called OOS which aims to push this paradigm even further.

The Object Operating System, OOS (pronounced "ooze") is an attempt to create a new operating system architecture. It's designed to use the filesystem to achieve a large number of tasks which would normally be done through different mechanisms. Much of the design philosophy was inspired by Unix and Plan 9, but OOS attempts to do things in its own way, trading compatibility for a simpler design. The most significant departure from its predecessors is a new type of filesystem in which files, directories, and libraries have been replaced by a unified entity known as a container. In addition to being unified, containers are also event driven, so that operations like reading, writing, copying, and deleting files are considered events whose default actions can be overridden. This allows a container to perform arbitrary operations on files as they are copied in and out of the container, or to provide virtual namespaces.

When you log onto an OOS machine, you are essentially logging onto the network, and your workstation is simply a node connected to it. As such, the filesystem is structured so that the root directory contains a list of the machines on the local network. It's also possible for a user to mount remote machines or even networks here. The root directory is multiplexed; that is, any changes a user makes to the directory are visible to only that user. Other users will be unaffected. Any directory/container can be multiplexed, and doing so is a convenient way to provide a private namespace to each login.

Because files and directories in OOS are the same thing, you are able to cat a directory and cd into a file. Being able to embed objects inside a file makes them act somewhat like resource forks on OSes such as MacOS. In this way, meta-information like icons can be embedded directly inside a file. However, OOS files are much more powerful, since the inside of a file can contain a full directory hierarchy. Because files and directories in OOS can contain data inside them, they are referred to as containers.

A container has three principle parts: A file part, a directory part, and a functional part. The file part acts just like a file in traditional operating systems. You can open it, read and write to it, seek within it, etc. The directory part also works like traditional directories. It allows a container to hold other containers within itself. The functional part works somewhat like a library, but there are a number of subtle differences. A program is able to link to one or more containers' functional code, making them act just like shared objects, but unlike shared objects, individual functions can be made to run with elevated privileges over the program linking them. This level of granularity virtually eliminates the need for setuid binaries, as are common on Unix systems. It also means hardware drivers can largely be written as libraries instead of kernel modules, reducing the burden placed on the kernel.

Containers also possess event-driven properties. Events include things like copying a file into or out of a container, creating and deleting containers, modifying the file or functional part of a container, and listing a container's contents. The default behavior for these events are the traditional behaviors seen in other OSes. For example, copying a file into a container will make a copy of the file inside the container's namespace. But because events can be overridden, it's possible to perform actions on the file as it's being copied. For example, it's possible to create a container that automatically compresses files as they are copied in. Events can also be used to provide virtual namespaces. When a user attempts to list the files inside a container, they can instead see something else. This is how OOS provides information on processes. It has a container called cpu, whose files are actually running processes, much like the proc namespace in Plan 9 and some Unices. I'll discuss the cpu container in more detail later.

Each container has an inode in which things like filenames, ownerships, permissions, and sizes are stored. Also included is a flag which determines if the container's contents will be multiplexed. By multiplexing a container, every user session will have its own private namespace within the container. Consider the previously mentioned container that compresses all files copied into it. If that container is multiplexed, several users can make use of it simultaneously without getting in each others' ways. Each user will think they have exclusive control of the container, when in reality they are sharing it. Because the namespace of a multiplexed container is connected to a specific session, when the session ends (i.e. the user logs out), the contents of that namespace are lost. Therefore, any important information stored in one of these multiplexed containers should be copied out before ending the session.

Each container is also able to specify to the kernel what type of filesystem it should contain and where it should look to read and write files. Most of the time, you'll want your containers to use the default OOS filesystem and use the local hard drive for your files. However, in some cases, you may want to store files in RAM for extra speed, or mount a foreign filesystem. Unless otherwise specified, all new containers use oosfs for the filesystem and //dev/disk for storage.

There's nothing really revolutionary in the kernel. It follows the idea of a microkernel, with the core doing as little as possible, leaving the bulk of the functionality to external modules. Hardware abstraction in OOS is largely delegated to userspace libraries, leaving just the critical parts like multitasking, IPC, file manipulation, and memory management to the kernel. Generally, drivers are only placed in the kernel if they need special attention that can't be provided from a userspace entity (like particular timing requirements).

The GUI is a perfect example of hardware abstraction delegated to userspace containers. A standard OOS system provides a container called gfx, which acts as the graphics driver. As such, it exports a set of privileged functions for manipulating the graphics card directly. The various functions provided include primitives like plotting points, lines, circles, and polygons, moving and scaling bitmaps, filling patterns, etc. Higher level functionality (like OpenGL) is also provided. The gfx container actually inherits its functionality from a generic gfx object which performs all its actions through software. The gfx driver overloads the functions for which it can provide hardware acceleration, higher resolution, more colors, etc.

As is normally the case with hardware drivers, containers which provide an implementation of the hardware are placed inside the driver. This gives them easy access to the driver's exported functions (in object oriented terms, the container inherits the functions from its superclass). Inside the gfx container, we find the screen container, which provides the implementation for a GUI.

Suppose we want to create a simple dialog and display it on the screen. We can create a window container using the "new" command, which creates a new object by copying it from a directory of templates. Once the window object is created, we can copy the widgets into it to fill in its content. Once that's done, displaying it is as simple as moving it into the //gfx/screen container. So it would work something like this:

new window w1 cd w1 new vpanel v1 cd v1 new textbox t1 new button b1 echo "Hello World!" >t1/text echo "OK" >b1/text cd .. mv w1 /localhost/gfx/screen

As GUI elements are naturally hierarchical in nature, it makes logical sense to arrange them in the filesystem's hierarchy. It also means that you would be able to use a file manager to generate graphical applications. Notice how attributes are set in the button and textbox widgets. There is a file within them called text, which, as the name implies, holds the text for these widgets. Each widget has a number of standard attribute files within it for setting colors, style, text, callback functions, etc. It is also possible to insert code directly into a widget for execution when it is activated; this is especially useful when network latency is high and a quick response is required, such as with mouseover events.

Suppose we also want to display this window on a remote machine. All we have to do is copy the window to the remote machine, e.g.:

cp /localhost/gfx/screen/w1 /titan/gfx/screen

This would put a copy of our window on a remote system named titan. Because the window and each of its widgets know how to draw themselves, the window on titan can be moved, resized, iconified, etc., and the widgets manipulated in any way, all without requiring any further network traffic. Contrast this with X, with which every action on a remote display means resending instructions on how to redraw the window and its contents. Anyone who has tried running remote X applications through a modem knows how painful this can be.

Because executable code can be inserted into the widgets, it's possible to send an entire program inside a window. Programs packaged this way are much like Java in that they are transferred in whole initially, then run entirely on the client side. In the other extreme, only the widgets themselves are transmitted through the network, leaving the originating machine to handle any computation. This scenario is similar to HTML with CGI scripts running on the server. A third way of handling remote displays is to send the program in a piecewise manner; the program is broken into pieces, and each piece is sent only when needed. Once a piece has been sent, it can be cached on the remote display server so it doesn't have to be sent again. For many tasks, this is the most efficient method.

Audio works much like video. There is a container called sfx which contains the audio driver. It exports privileged functions for playing waveforms, setting the volume, mixing channels, etc. Within the sfx container are containers for various types of implementations. A container called mp3 will play any mp3 file copied into it. Similar containers exist for other sound formats. Within the functional part of each of these containers are functions for identifying the audio type. For example, the wav container has a function to determine if an audio stream is in wav format. It also has a function to convert the wav stream into a raw audio stream, so that it can be played. All audio containers have these two functions, which can in turn be used to identify and play any known audio format.

Unix users will be happy to find that traditional device files still exist, and that they are stored in the usual //dev directory. The dev directory is generated dynamically to accurately reflect the hardware in the machine (a la devfs). Each device is represented as a hierarchy in which the top level container represents the whole device and lower levels represent components of the device. For example, a hard drive pool is represented as //dev/disk. Multiple hard drives are accumulated into a single pool of space, like LVM. Cat this file to see the raw contents of the pool, or cd into it to see the physical drives, represented as numbers (1, 2, 3, etc).

Processes on OOS are presented in the //cpu container. Similar in concept to the /proc filesystem in Plan 9, but more capable, the cpu container holds a list of running processes on the given machine. An executable can be run by copying it into the cpu container. Once running, it will appear inside the container as a file whose name is its process ID. Cating this process ID file will reveal a memory image of the running process, whereas cding into it will provide a more easily disseminated collection of attributes, including a list of open files, its environment, its process-specific load average, its running state, etc. Deleting a process file will kill the process.

Starting a process through the cpu container causes it to start in its own session. This means its stdin, stdout, and stderr will be redirected to/from never-never land, which is good for daemonizing a process but bad for running interactive terminal programs. Processes will, however, be run with the same environment (environment awvariables, etc.) as your current shell, so if they create a GUI, that will still work (your local display path is stored in your environment, so running processes will know where to open their windows). If you need to run a process in your current session, you have to use the more conventional methods like fork, which under OOS is able to fork a process to a different machine. It's also possible to copy a running process from one machine's cpu container to another, provided both machines have identical CPUs.

Throughout this paper, we've talked about running processes on remote machines without giving much thought to the incompatibilities between different CPU types. A program compiled for Intel will simply not run on a Sparc, for example. To get around this problem, we use a concept known as pcode. Pcode is a mostly-compiled form of an executable. In this form, it remains portable across CPU types, but because a large part of the compilation has already been done, it can be translated into a machine-specific executable quickly. This machine-specific portion is cached on the machine on which the process runs, allowing this translation phase to be skipped if the executable is run a second time. This idea is borrowed from Plan 9, but is similar in concept to Java with a JIT compiler.

Suppose there is a machine on the network called backup01 which contains a tape drive. If you want to backup a directory on your local machine, you would do so like this:

cp /localhost/directory_to_be_archived /backup01/dev/tape

Restoring the directory from tape is as simple as copying it back:

cp /backup01/dev/tape /localhost/directory_to_be_restored

It works essentially like creating images of, say, floppies under Unix, except here I can read and write the images to devices on the other side of the network. Unix would normally accomplish the same thing by piping the output of tar through a rsh (or something similar), which in turn would copy it to the remote device.

A running program exports everything needed to debug it in the cpu directory. Due to its network transparency, you can attach to a program, even if it's running on a separate machine. Simply point your debugger to the proper cpu directory where the process is running, and it will be able to directly manipulate the executable's memory structures through its exported image.

Create a directory and copy some data into it. Now, make the directory multiplexed and cd into it. You will see the contents you copied in. At this point, any changes you make to the contents of this directory can be rolled back by simply logging out (or opening a new terminal window). Other users who access this directory will see its initial contents, regardless of any changes you have made (since it's multiplexed). If you do this, just remember to make a copy of any changes you want to keep, or they will be lost when you log out.

Remember how directories can perform arbitrary actions on files copied into them? We've already seen how this is useful for graphics, audio, and CPU information. Here are a few more ideas on how this can be used: Imagine a directory which will send email. Naturally, the email files would have to have full headers so the mailer would know where to send them, but it could be done. Want to send a message? Just compose it in your favorite text editor and copy it into the mailer directory. Printing can work in the same way. Why have separate commands for printing when you can just copy your files into a printer directory? Tired of using clumsy FTP clients to transfer files? Just make a directory that understands the FTP protocol and map it into a filesystem. Need to transfer a file? Just copy it into (or out of) the ftp directory for your remote host (Plan 9 does this too, by the way). If you wanted to, you could even make a compiler directory. Copy your source files in, then copy the final executable out after the compilation is finished.

Unix introduced the "everything is a file" philosophy more than 40 years ago. This philosophy has served it well over that time, allowing its core functionality to survive the years. However, as new technologies came into existence, many were added without regard to the philosophy which gave Unix one of its greatest strengths. Plan 9 was later introduced as a successor, pulling many of these technologies back into the realm of files and directories. Plan 9 allows applications and drivers to export their own filesystem, allowing external programs to interact with them through a consistent API. OOS hopes to extend this paradigm by extending the abilities of a file to include event-driven properties and multiplexing, and by delivering varied services through a consistent file-centric interface.