Virtual File System in Linux:
1. Goals:
Linux is designed to support many different physical devices. Even for one
specific type of device, such as hard drives, there are many interface
differences between different hardware vendors. Linux supports a number of
logical file systems. It can inter-operate easily with other operating systems.
The Linux file system supports the following goals:
- Multiple hardware devices: provide access to many different hardware devices.
- Multiple logical file systems: support many different logical file systems.
- Multiple executable formats: support several different executable file
formats (like a out, ELF, Java).
- Homogeneity: present a common interface to all of the logical file systems and all hardware devices.
- Performance: provide high-speed access to files.
- Safety: do not lose or corrupt data.
- Security: restrict user access to access files; restrict user total file size
with quotas.
2. External Interface:
The file system provides two levels of interface: a system-call interface that
is available to user processes, and an internal interface that is used by other
kernel subsystems. The system-call interface deals with files and directories. Operations on files include the usual open/close/read/write/seek/tell that are
provided by POSIX compliant systems; different types of operations on
directories include readdir/creat/unlink/chmod/stat as usual for POSIX systems.
The interface that the file subsystem supports for other kernel subsystems is much richer. The file subsystem exposes data structures and implementation function for direct manipulation by other kernel subsystems. In particular, two interfaces are exposed to the rest of the kernel --- inodes and files. Other
implementation details of the file subsystem are also used by other kernel
subsystems, but this use is less common.
Inode Interface:
- create(): create a file in a directory
- lookup(): find a file by name within a directory
- link() / symlink() / unlink() / readlink() / follow_link(): manage file system
links
- mkdir() / rmdir(): create or remove sub-directories
- mknod(): create a directory, special file, or regular file
- readpage() / writepage(): read or write a page of physical memory to a backing
store
- truncate(): set the length of a file to zero
- permission(): check to see if a user process has permission to execute an
operation
- smap(): map a logical file block to a physical device sector
- bmap(): map a logical file block to a physical device block
- rename(): rename a file or directory
In addition to the methods you can call with an inode, the namei() function is
provided to allow other kernel subsystems to find the inode associated with a
file or directory.
File Interface:
- open() / release(): open or close the file
- read() / write(): read or write to the file
- select(): wait until the file is in a particular state (readable or writeable)
- Iseek(): if supported, move to a particular offset in the file
- mmap(): map a region of the file into the virtual memory of a user process
- fsync() / fasync(): synchronize any memory buffers with the physical device
- readdir: read the files that are pointed to by a directory file
- ioctl: set file attributes
- check_media_change: check to see if a removable media has been removed (such
as a floppy)
- revalidate: verify that all cached information is valid
3. Subsystem Description:
The file subsystem needs to support many different logical file systems and many
different hardware devices. It does this by having two conceptual layers that
are easily extended. The device driver layer represents all physical devices
with a common interface. The virtual file system layer (VFS) represents all
logical file systems with a common interface. The conceptual architecture of
Linux kernel shows how this decomposition is conceptually arranged.
Device Drivers:
The device driver layer is responsible for presenting a common interface to all
physical devices. Linux kernel has three types of device driver: character,
block and network. The two types relevant to file subsystem are character and
block devices. Character devices must be accessed sequentially such as tape
drives, modems, and mice. Block devices can be accessed in any order but can
only be read and written to in multiples of the block size.
Each device can be accessed as though it was a file in file system. This file is
referred to as a device special file. Most of kernel deals with devices via file
interface, it is easy to add a new device driver by implementing
hardware-specific code to support this file interface.
Linux kernel uses a buffer cache to improve performance when accessing block
devices. All access to block devices occurs through a buffer cache subsystem.
The buffer cache greatly increases system performance by minimizing reads and
writes to hardware devices. Each hardware device has a request queue; when the
buffer cache cannot fulfill a request from in-memory buffers, it adds a request
to the device's request queue and sleeps until this request has been satisfied.
The buffer cache uses a separate kernel thread, kflushd, to write buffer pages
out to the devices and remove them from the cache.
When a device driver needs to satisfy a request, it begins by initiating the
operation with hardware device manipulating the device's control and status
registers (CSR's). There are three general mechanisms for moving data from the
main computer to peripheral device: polling, direct memory access (DMA) and
interrupts.
When a hardware device wants to report a change in condition (mouse button
pushed, key pressed) or to report the completion of an operation, it sends an
interrupt to the CPU. If interrupts are enabled, CPU stops executing current
instruction and begins executing Linux kernel's interrupt handling code. The
kernel finds appropriate interrupt handler to invoke. While an interrupt is
being handled, CPU executes in a special context; other interrupts may be
delayed until the interrupt is handled. Because of this restriction interrupt
handlers need to be quite efficient so that other interrupts are not lost.
Sometimes an interrupt handler cannot complete all required work within the time
constraints. In this case, the interrupt handler schedules the work in a
bottom-half handler. A bottom-half handler is code that is executed by scheduler
the next time a system call is completed.
Logical File Systems:
It is possible to access physical devices through device special file. It is more common to access block devices through a logical file system. A logical file system can be mounted at a mount point in virtual file system. It means that the associated block device contains files and structure information that allow logical file system to access the device. At any one time, a physical device can only support one logical file system. However, the device can be reformatted to support a different logical file system.
When a file system is mounted as a subdirectory, all directories and files
available on the device are made visible as subdirectories of mount point. Users
of virtual file system do not need to be aware what logical file system is
implementing which parts of directory tree etc. This abstraction provides a
great deal of flexibility in the choice of physical devices and logical file
systems. It is one of the essential factors in success of Linux operating
system.
Linux uses the concept of inodes to support virtual file system. It uses an
inode to represent a file on a block device. The inode is virtual in the sense
that it contains operations that are implemented differently depending on
logical system and physical system. The inode interface. makes all files appear
the same to other Linux subsystems. It is used as a storage location for all of
the information related to an open file on disk. It stores associated buffers,
total length of file in blocks and the mapping between file offsets and device
blocks.
Modules:
Most of the functionality of virtual file system is available in the form of
dynamically loaded modules. This dynamic configuration allows Linux users to
compile a kernel that is as small as possible, while still allowing it to load
required device driver and file system modules if necessary during a single session. For example, a Linux system
might optionally have a printer attached to its parallel port. If printer driver
were always linked in to kernel, then memory would be wasted when the printer
isn't available. By making the printer driver be a loadable module, Linux allows
the user to load the driver if the hardware is available.
4. Data Structures:
The following data structures are architecturally relevant to the file
subsystem:
- super_block: Each logical file system has an associated superblock used to
represent it to the rest of Linux kernel. It contains information about the
entire mounted file system.
- inode: An inode is an in-memory data structure that represents all of the
information that the kernel needs to know about a file on disk. It stores all of
the information that the kernel needs to associate with a single file.
Accounting, buffering, and memory mapping information are all stored in the
inode.
- file: The file structure represents a file that is opened by a particular
process. All open files are stored in a doubly-linked list. The
file-descriptor used in POSIX style routines (open, read, write) is the index of
a particular open file in this linked list.
5. Subsystem Structure:
The file system depends on all other kernel subsystems and all other kernel
subsystems depend on the file subsystem. In particular, the network subsystem
depends on the file system because network sockets are presented to user
processes as file descriptors. The memory manager depends on file system to
support swapping. The IPC subsystem depends on file system to implement pipes
and FIFO's. The process scheduler depends on file system to load modules.
The file system uses the network interface to support NFS. It uses memory
manager to implement buffer cache and for a ramdisk device. It uses IPC
subsystem to help support modules and it uses process scheduler to put user
processes to sleep while hardware requests are completed.
|