|
|
|
Home > Computing > Linux > File handling in the Linux kernel
File handling in the Linux kernel: application layer
Last modified: Fri Aug 3 08:31:20 2007
A layered architecture
Like most complex software systems, the Linux kernel, the applications
it is supporting,
and the hardware it runs on can be viewed as a `layered' system.
At the top of
the stack of layers we have a high degree of abstraction, and minimal control
over the detailed operation of the system. At the bottom layer
of the stack we have the real hardware -- disk, IO, and DMA controllers,
and so on.
Each layer is an abstraction, or simplification, of the one below it.
In an ideal world, each layer would invoke the services of the the layer
directly below it, and would itself be invoked by the layer directly
above it. If this ideal structure is enforced, the whole system is
loosely-coupled: dependencies between layers are minimized, and it
is possible to modify one layer without have too severe an impact
on other parts of the system.
The Linux kernel is not ideal in this sense. There are, perhaps, two
main reasons for this. First, the kernel has grown organically, building
on
the contributions of a large number of people over many years.
Whole chunks of the kernel
have been pulled out and replaced, with the new pieces
not fitting into exactly the same places. Secondly, it is important to
consider the `horizontal' partitioning of the kernel at the same
time as its vertical layering.
By `horizontal partitioning' I mean the division into subsystems of
code that is notionally at the same level of abstraction.
Different subsystems often make use of
each other but, unless they are at identical levels of abstraction,
the interaction between subsystems inevitably to some blurring of
the layer distinctions. In order to make these articles easier to
understand, I have imposed my own layer structure, but my
layers are conceptual, and are not generally so clearly defined in
the real kernel.
In this first article I will describe each of the layers in outline,
and then go on to a more detailed description of the highest layers (because these are relatively easy to describe). Each of the following articles will
deal with one layer, working from top to bottom. So, in this article we
will see application code, while in the final article we will see voltages
changing on pins.
In outline, the layers which can be identified are the following.
- The application layer. Here we find the application code: C,
C++, Java, etc. Application coding will be familiar to most
developers, so we won't have much to say about it here.
- The library layer. It is unusual for an application program to
interact directly with kernel services. Apart from the additional complexity
such an interaction would introduce, such a practice would introduce needless
platform-dependencies into the application code. In practice most, perhaps
all, Linux applications will have their interface with the kernel
in the GNU standard C library (`glibc'). This includes not only
applications written in C or C++, but applications that run on
runtime environments written in C or C++ (tcl, java, etc).
- The VFS layer. VFS is the highest, most abstract part
of the kernel's file handling infrastructure. VFS provides a set of
API calls for standard file handling functionality (open, read,
write, etc.) that are independent of the actual implementation of
the file. VFS calls work not only on files, but also on entities that
have pathnames but are,
nonetheless, not true files (pipes, sockets, character devices, etc).
At this level of abstraction, the implementation details are
unimportant.
Details are just delegated to the lower layers.
The purpose of VFS is to provide a unified, file-like interface
between these various entities and the application. The VFS
code extends into the filesystem layer as well, as we shall
see later. Most of the VFS code is in the directory
fs/
in the kernel source.
- The Filesystem layer. The filesystem layer converts the high-level
operations understood by VFS -- reading, writing, etc. -- into low-level
operation on disk blocks (or whatever the storage medium happens to be).
However, because most disk filesystems are essentially similar, VFS also
provides generic filesystem handling code. Specific filesystems are free
to make use of this generic code (most do), or do all the work themselves.
Most of this code is in the
fs/ directory, along with the
other VFS stuff, but some is in mm/ with the virtual memory
management infrastructure.
- The generic block device layer. A filesystem does not have
to have a block device underneath it. For example, the
/proc
filesystem has no permanent storage at all. VFS does not care how a filesystem
is implemented, so long as it implements the correct API. Most disk file
systems are, however, implemented on top of a block device. A block device
models a data storage device as a set of contiguous data blocks of a
fixed size. The block device does not know or care what goes in the
data blocks -- that is the job of the filesystem handler.
In most cases, real file
systems to not make calls on block device drivers, even high-level calls.
They are free to do so, but it is often easier for the developer
to use the generic block device support provided by the
code in drivers/block. This generic code provides a
lot of the functionality that all block devices will need, particular
request queue and buffer management.
- The device driver. The device driver is the lowest-level,
least abstract piece of software, and typically interacts directly
with the hardware devices. It usually does this by means of port IO,
memory-mapped IO, DMA, and interrupts, perhaps in combination. It is
an impressive feature of the Linux kernel that most device drivers are
almost entirely platform-independent in their source code. Although device
drivers can be implemented in assembler, most Linux drivers are written
in C. In the diagram above, I have shown the device driver incorporating
an interrupt handler. Not all drivers service interrupts, but
many do. Many Linux interrupt handlers are divided into `top half'
and `bottom half' (of which, more later). In my diagram the `top half'
is at the bottom, because it is conceptually closer to the hardware.
- The hardware. At the bottom of the stack we have the real
hardware - disk controllers, SCSI controllers, and so on.
The layered architecture is highly flexible. For example, at any
particular layer there can be more than one subsystem, each of which
operates in somewhat different ways. For example, in the filesystem
layer we have handlers for the ext2 filesystem, ISO9660 filesystem,
UDF, and so on. These handlers know nothing about device drivers, and
can create and manage a filesystem on almost any block device. At the device driver layer we have support for
SCSI, IDE, MFM, and other controllers. These drivers can be used with
any filesystem. What's more, sub-stacks
of the layered architecture can be stacked on top of other
sub-stacks. Consider, for example, the use of USB hard disks.
USB is normally a serial interface, but to drive a hard disk from
the USB bus we need a protocol for doing block operations through
this serial link. We could implement a separate `USB filesystem' and
plug it into the VFS layer, but in practice we don't have to. What
we could do, for example, is to write a block device driver that
responds to requests from any filesystem, and converts them into
USB requests. This driver could then be stacked on
top of the whole USB protocol
stack, which is itself layered. Alternatively -- and what is, in fact,
done in the stock kernel
-- we could write a driver that converts SCSI disk controller
requests into USB requests, and then insert that
driver into the stack between the generic SCSI block device driver
and the USB stack.
It should be obvious that the use of the layered architecture leads to
a high degree of flexibility. It has the disadvantage, however, of
making it very difficult to build up a mental picture of the entire
system.
Because we can't describe every possible combination of the various
layers available to the kernel, in these articles I have selected some
specific examples for the purposes of illustration. In particular, the
application is written in C, and linked against the GNU standard
C library. The filesystem is ext2, and hosted on a hard disk
attached to an IDE bus.
Application layer
In these articles, we will assume that the application is written in
C, and uses the well-known C low-level file-handling functions.
To illustrate the concepts to be discussed,
we will use the following, very simple application code.
It opens a file, reads a few kilobytes of data, and closes it.
Here is the relevant
code fragment:
char buff[5000];
int f = open ("/foo/bar", O_RDONLY);
read (f, &buff, sizeof(buff));
close (f);
We will consider the flow of execution resulting from the open()
and read() calls, all the way to the disk controller hardware.
In all modern operating systems, application code is fundamentally different
from kernel code. Application code is much more limited in what it can
do, and subject to more rigorous controls. However, ultimately
a thread of execution has to be able to pass from application code,
through the kernel, and back to the application. The applications and the
kernel are not separate processes, or even separate threads. This implies
that we need a way to change the mode of execution from `application
mode' to `kernel mode' and back again in a single thread.
In all architectures there will
be some instruction or set of instructions that have this effect.
To avoid platform-dependencies, as well as to simplify coding, these
instructions are typically encapsulated within standard libraries
linked into the application.
The library layer
On Linux systems, the C open() and read() functions,
as well as all the other
standard C/C++ file handling functions, are usually implemented in the GNU
standard C library (`glibc'). The executable code for this library is
typically found in the archive
/lib/libc-XXX.so, where `XXX' is a version number. When
you compile a C program with gcc,
it is automatically linked against glibc
-- no developer intervention is required.
The library functions in glibc are made
available to the application by the magic of dynamic linking, so
calls are not direct, but that need not concern us here. The way
that glibc handles the open() operation, for example, will be
architecture-dependant, but in most (all?) cases it will issue
a system call, that is, a trap into the kernel.
On x86 Linux, what the open() function in glibc
does is to load the number for the open()
system call (number `5' in
this case) into the
esp register, then execute the instruction
int 0x80
This software interrupt enters the Linux kernel, through the x86
interrupt vector table, at code defined
in the (assembly language)
file arch/i386/kernel/entry.S. After some jiggling
around of the stack, the system call number is used as an
offset into the system call table, which is
also defined in
entry.S under the name sys_call_table.
Call number 5, the `open' call,
is defined to point to the address of a function
that will handle the open operation. Unless some
other piece of kernel-level code has changed it, this function
will be sys_open, which is defined in
fs/open.c. Although the mechanism of trapping into the
kernel varies between architectures, in most (all?) cases the
glibc function open() ends up as a call
to the kernel function sys_open(), unless the system
call table has been changed.
Incidentally, you won't necessarily find sys_call_table
in the list of symbols exported by the kernel. After some intial
uncertainty, it is now generally agreed that changing the behaviour
of the kernel by modifying the contents of sys_call_table
is a Bad Thing, and there are less intrusive ways to achieve the same effect.
All the foregoing architecture-dependent magic is conceptually
not very significant. It isn't very incorrect to think of the application's
open() call logically being implemented by the
sys_open()
function in the kernel, with a corresponding change of execution mode
from `application mode' to `kernel mode'.
You should also notice that no magic happens with threads when system
calls are made. If applications make multiple, concurrent system calls,
then multiple, concurrent threads of execution
will enter the kernel. However, although there may be many distinct files
on a particular physical disk, there will only be one disk
controller per physical disk. As a result, at some point the kernel
will have to implement some locking, to prevent multiple threads
attempting to interact with the same hardware at the same time.
Typically locking only
occurs in the lowest levels of the kernel, so we don't need to worry
too much about it just yet.
Next: the VFS layer
|
|
|
|
Shameless plug
|
 By the author of this site. Buy on-line from Amazon USA | UK
|
|