Logo ©1994-2007 Kevin Boone
My professional interests
Computing
Law
Education
Science and research

My leisure interests
Martial arts
Heritage railways
Garden railways
Motorcycles
DIY

Downloads
Linux downloads
Windows downloads
Java downloads
Perl downloads
Home automation downloads

About me
Home & family
My CV

Site info
Contact the author
Download policy
Keyword index

  Home > Computing > Linux > File handling in the Linux kernel

File handling in the Linux kernel: application layer

Last modified: Fri Aug 3 08:31:20 2007

A layered architecture

Like most complex software systems, the Linux kernel, the applications it is supporting, and the hardware it runs on can be viewed as a `layered' system. At the top of the stack of layers we have a high degree of abstraction, and minimal control over the detailed operation of the system. At the bottom layer of the stack we have the real hardware -- disk, IO, and DMA controllers, and so on. Each layer is an abstraction, or simplification, of the one below it. In an ideal world, each layer would invoke the services of the the layer directly below it, and would itself be invoked by the layer directly above it. If this ideal structure is enforced, the whole system is loosely-coupled: dependencies between layers are minimized, and it is possible to modify one layer without have too severe an impact on other parts of the system.

The Linux kernel is not ideal in this sense. There are, perhaps, two main reasons for this. First, the kernel has grown organically, building on the contributions of a large number of people over many years. Whole chunks of the kernel have been pulled out and replaced, with the new pieces not fitting into exactly the same places. Secondly, it is important to consider the `horizontal' partitioning of the kernel at the same time as its vertical layering. By `horizontal partitioning' I mean the division into subsystems of code that is notionally at the same level of abstraction. Different subsystems often make use of each other but, unless they are at identical levels of abstraction, the interaction between subsystems inevitably to some blurring of the layer distinctions. In order to make these articles easier to understand, I have imposed my own layer structure, but my layers are conceptual, and are not generally so clearly defined in the real kernel.

In this first article I will describe each of the layers in outline, and then go on to a more detailed description of the highest layers (because these are relatively easy to describe). Each of the following articles will deal with one layer, working from top to bottom. So, in this article we will see application code, while in the final article we will see voltages changing on pins.

In outline, the layers which can be identified are the following.

  • The application layer. Here we find the application code: C, C++, Java, etc. Application coding will be familiar to most developers, so we won't have much to say about it here.
  • The library layer. It is unusual for an application program to interact directly with kernel services. Apart from the additional complexity such an interaction would introduce, such a practice would introduce needless platform-dependencies into the application code. In practice most, perhaps all, Linux applications will have their interface with the kernel in the GNU standard C library (`glibc'). This includes not only applications written in C or C++, but applications that run on runtime environments written in C or C++ (tcl, java, etc).
  • The VFS layer. VFS is the highest, most abstract part of the kernel's file handling infrastructure. VFS provides a set of API calls for standard file handling functionality (open, read, write, etc.) that are independent of the actual implementation of the file. VFS calls work not only on files, but also on entities that have pathnames but are, nonetheless, not true files (pipes, sockets, character devices, etc). At this level of abstraction, the implementation details are unimportant. Details are just delegated to the lower layers. The purpose of VFS is to provide a unified, file-like interface between these various entities and the application. The VFS code extends into the filesystem layer as well, as we shall see later. Most of the VFS code is in the directory fs/ in the kernel source.
  • The Filesystem layer. The filesystem layer converts the high-level operations understood by VFS -- reading, writing, etc. -- into low-level operation on disk blocks (or whatever the storage medium happens to be). However, because most disk filesystems are essentially similar, VFS also provides generic filesystem handling code. Specific filesystems are free to make use of this generic code (most do), or do all the work themselves. Most of this code is in the fs/ directory, along with the other VFS stuff, but some is in mm/ with the virtual memory management infrastructure.
  • The generic block device layer. A filesystem does not have to have a block device underneath it. For example, the /proc filesystem has no permanent storage at all. VFS does not care how a filesystem is implemented, so long as it implements the correct API. Most disk file systems are, however, implemented on top of a block device. A block device models a data storage device as a set of contiguous data blocks of a fixed size. The block device does not know or care what goes in the data blocks -- that is the job of the filesystem handler. In most cases, real file systems to not make calls on block device drivers, even high-level calls. They are free to do so, but it is often easier for the developer to use the generic block device support provided by the code in drivers/block. This generic code provides a lot of the functionality that all block devices will need, particular request queue and buffer management.
  • The device driver. The device driver is the lowest-level, least abstract piece of software, and typically interacts directly with the hardware devices. It usually does this by means of port IO, memory-mapped IO, DMA, and interrupts, perhaps in combination. It is an impressive feature of the Linux kernel that most device drivers are almost entirely platform-independent in their source code. Although device drivers can be implemented in assembler, most Linux drivers are written in C. In the diagram above, I have shown the device driver incorporating an interrupt handler. Not all drivers service interrupts, but many do. Many Linux interrupt handlers are divided into `top half' and `bottom half' (of which, more later). In my diagram the `top half' is at the bottom, because it is conceptually closer to the hardware.
  • The hardware. At the bottom of the stack we have the real hardware - disk controllers, SCSI controllers, and so on.
The layered architecture is highly flexible. For example, at any particular layer there can be more than one subsystem, each of which operates in somewhat different ways. For example, in the filesystem layer we have handlers for the ext2 filesystem, ISO9660 filesystem, UDF, and so on. These handlers know nothing about device drivers, and can create and manage a filesystem on almost any block device. At the device driver layer we have support for SCSI, IDE, MFM, and other controllers. These drivers can be used with any filesystem. What's more, sub-stacks of the layered architecture can be stacked on top of other sub-stacks. Consider, for example, the use of USB hard disks. USB is normally a serial interface, but to drive a hard disk from the USB bus we need a protocol for doing block operations through this serial link. We could implement a separate `USB filesystem' and plug it into the VFS layer, but in practice we don't have to. What we could do, for example, is to write a block device driver that responds to requests from any filesystem, and converts them into USB requests. This driver could then be stacked on top of the whole USB protocol stack, which is itself layered. Alternatively -- and what is, in fact, done in the stock kernel -- we could write a driver that converts SCSI disk controller requests into USB requests, and then insert that driver into the stack between the generic SCSI block device driver and the USB stack.

It should be obvious that the use of the layered architecture leads to a high degree of flexibility. It has the disadvantage, however, of making it very difficult to build up a mental picture of the entire system.

Because we can't describe every possible combination of the various layers available to the kernel, in these articles I have selected some specific examples for the purposes of illustration. In particular, the application is written in C, and linked against the GNU standard C library. The filesystem is ext2, and hosted on a hard disk attached to an IDE bus.

Application layer

In these articles, we will assume that the application is written in C, and uses the well-known C low-level file-handling functions. To illustrate the concepts to be discussed, we will use the following, very simple application code. It opens a file, reads a few kilobytes of data, and closes it. Here is the relevant code fragment:
char buff[5000];
int f = open ("/foo/bar", O_RDONLY);
read (f, &buff, sizeof(buff));
close (f);
We will consider the flow of execution resulting from the open() and read() calls, all the way to the disk controller hardware. In all modern operating systems, application code is fundamentally different from kernel code. Application code is much more limited in what it can do, and subject to more rigorous controls. However, ultimately a thread of execution has to be able to pass from application code, through the kernel, and back to the application. The applications and the kernel are not separate processes, or even separate threads. This implies that we need a way to change the mode of execution from `application mode' to `kernel mode' and back again in a single thread. In all architectures there will be some instruction or set of instructions that have this effect. To avoid platform-dependencies, as well as to simplify coding, these instructions are typically encapsulated within standard libraries linked into the application.

The library layer

On Linux systems, the C open() and read() functions, as well as all the other standard C/C++ file handling functions, are usually implemented in the GNU standard C library (`glibc'). The executable code for this library is typically found in the archive /lib/libc-XXX.so, where `XXX' is a version number. When you compile a C program with gcc, it is automatically linked against glibc -- no developer intervention is required. The library functions in glibc are made available to the application by the magic of dynamic linking, so calls are not direct, but that need not concern us here. The way that glibc handles the open() operation, for example, will be architecture-dependant, but in most (all?) cases it will issue a system call, that is, a trap into the kernel. On x86 Linux, what the open() function in glibc does is to load the number for the open() system call (number `5' in this case) into the esp register, then execute the instruction
int 0x80
This software interrupt enters the Linux kernel, through the x86 interrupt vector table, at code defined in the (assembly language) file arch/i386/kernel/entry.S. After some jiggling around of the stack, the system call number is used as an offset into the system call table, which is also defined in entry.S under the name sys_call_table. Call number 5, the `open' call, is defined to point to the address of a function that will handle the open operation. Unless some other piece of kernel-level code has changed it, this function will be sys_open, which is defined in fs/open.c. Although the mechanism of trapping into the kernel varies between architectures, in most (all?) cases the glibc function open() ends up as a call to the kernel function sys_open(), unless the system call table has been changed.
      Incidentally, you won't necessarily find sys_call_table in the list of symbols exported by the kernel. After some intial uncertainty, it is now generally agreed that changing the behaviour of the kernel by modifying the contents of sys_call_table is a Bad Thing, and there are less intrusive ways to achieve the same effect.

All the foregoing architecture-dependent magic is conceptually not very significant. It isn't very incorrect to think of the application's open() call logically being implemented by the sys_open() function in the kernel, with a corresponding change of execution mode from `application mode' to `kernel mode'.
      You should also notice that no magic happens with threads when system calls are made. If applications make multiple, concurrent system calls, then multiple, concurrent threads of execution will enter the kernel. However, although there may be many distinct files on a particular physical disk, there will only be one disk controller per physical disk. As a result, at some point the kernel will have to implement some locking, to prevent multiple threads attempting to interact with the same hardware at the same time. Typically locking only occurs in the lowest levels of the kernel, so we don't need to worry too much about it just yet.

Next: the VFS layer

   
Search

WebThis site

Shameless plug

By the author of this site. Buy on-line from Amazon USA | UK

Editorial
So you want to be a university lecturer? Read this first!

Speak like your boss: new developments in managerese

Computing features
File handling in the Linux kernel: an in-depth look at how Linux handles files, filesystems, and file I/O

All sorts of Linux stuff

Confused about CLASSPATH? answers are here

First steps in EJB using jBoss (recently revised for jBoss 3.2)