Inter-Process Communication in Linux:
Linux IPC mechanism is provided so that concurrently executing processes have a
means to share resources, synchronize and exchange data with one another. Linux
implements all forms of IPC between processes executing on the same system
through shared resources, kernel data structures, and wait queues.
Linux provides the following forms of IPC:
- Signals: perhaps the oldest form of Unix IPC, signals are asynchronous
sent to a process.
- Wait queues: provides a mechanism to put processes to sleep while they
are waiting for an operation to complete.
- File locks: provides a mechanism to allow processes to declare either regions
of a file, or the entire file itself, as read-only to all processes except the
one which holds the filelock.
- Pipes and Named Pipes: allows connection-oriented, bi-directional data
transfer between two processes either by explicitly setting up the pipe
connection, or communicating through a named pipe residing in the file-system.
- System VIPC
- Semaphores: an implementation of a classical semaphore model. The model also
allows for the creation of arrays of semaphores.
- Message queues: a connectionless data-transfer model. A message is a sequence
of bytes, with an associated type. Messages are written to message queues, and
messages can be obtained by reading from the message queue, possibly restricting
which messages are read in by type.
- Shared memory: a mechanism by which several processes have access to the same
region of physical memory.
- Unix Domain sockets: another connection-oriented data-transfer mechanism that
provides the same communication model as the INET sockets, discussed in the next
2. External Interface:
A signal is a notification sent to a process by kernel or another process.
Signals are sent with the send_sig() function. The signal number is provided as
a parameter, as well as the destination process. Processes may register to
handle signals by using the signal() function.
File locks are supported directly by the Linux file system. To lock an entire
file, the open() system call can be used, or the sys_fcntl() system-call can be
used. Locking areas within a file is done through the sys_fcntl() system call.
Pipes are created by using the pipe() system call. The file-systems read() and
write() calls are then used to transfer data on the pipe. Named pipes are opened
using the open() system-call. The System V IPC mechanisms have a common
interface, which is the ipc() system call. The various IPC operations are
specified using parameters to the system call.
The Unix domain socket functionality is also encapsulated by a single system
call, socketcall(). Each of the system-calls mentioned above are well documented,
and the reader is encouraged to consult the corresponding man-page.
The IPC subsystem exposes wait calls to other kernel subsystems. Since wait
queues are not used by user processes, they do not have a system-call interface.
Wait queues are used in implementing semaphores, pipes, and bottom-half handlers. The procedure add_wait_queue() inserts a task into a wait
queue. The procedure remove_wait_queue() removes a task from the wait queue.
3. Subsystem Description:
Signals are used to notify a process of an event. A signal has the effect of
altering the state of recipient process, depending on the semantics of
particular signal. Kernel can send signals to any executing process. A user
process may only send a signal to a process or process group if it possesses
associated FID or GID. Signals are not handled immediately for dormant
processes. Rather, before the scheduler sets a process running in user mode
again, it checks if a signal was sent to process. If so, then the scheduler
calls the do_signal() function, which handles the signal appropriately.
Wait queues are simply linked lists of pointers to task structures that
correspond to processes that are Waiting for a kernel event such as conclusion
of a DMA transfer. A process can enter itself on the wait queue by either
calling sleep_on() or interruptable_sleep_on()
functions. The functions wake_up() and wake_up_interruptable() remove the
process from the wait queue. Interrupt routines also use wait-queues to avoid
Linux allows user process to prevent other processes to access a file. This
exclusion can be based on a whole file or a region of a file. File-locks are
used to implement this exclusion. The file-system implementation contains
appropriate data: fields in its data structures to allow kernel to determine if
a lock has been placed on a file or a region inside a file. In the former case,
a lock attempt on a locked file, will fail. In the latter case, an attempt to
lock a region already locked will fail. In either case, the requesting process
is not permitted to access the file since the lock has not been granted by the
Pipes and named pipes have a similar implementation, as their functionality is
almost the same. The creation of process is different. However, in either case a
file descriptor is returned which refers to pipe. Upon creation, one page of
memory is associated with opened pipe. This memory is treated like circular
buffer to which write operations are done atomically. When the buffer is full,
the writing processes block. If a read request is made for more data than
available, the reading processes block. Each pipe has a wait queue associated
with it. Processes are added and removed from the queue during the read and
Semaphores are implemented with wait queues and follow classical semaphore
model. Each semaphore has an associated value. Two operations, up() and down()
are implemented on the semaphore. When the value of the semaphore is zero, the
process performing the decrement on the semaphore is blocked on the wait queue.
Semaphore arrays are simply a contiguous set of semaphores. Each process also
maintains a list of semaphore operations it has performed, so that if the
process exits prematurely, these operations can be undone.
The message queue is a linear linked-list, to which processes read or write a
sequence of bytes. Messages are received in the same order that they are
written. Two wait queues are associated with the message queues, one for
processes that are writing to a full message queue, and another for serializing
the message writes. The actual size of the message is set when the message queue
Shared memory is the fastest form of IPC. This mechanism allows processes to
share a region of their memory. Creation of shared memory areas is handled by
the memory management system. Shared pages are attached to the user processes
virtual memory space by the system call sys_shmat(). A shared page can be
removed from the user segment of a process by calling the sys_shmdt() call.
The Unix domain sockets are implemented in a similar fashion to pipes, in the
sense that both are based on a circular buffer based on a page of memory.
provide a separate buffer for each communication direction.
4. Data Structures:
In this section, the important data structures needed to implement the above IPC mechanisms are described.
Signals are implemented through the signal field in the task_struct structure.
Each signal is represented by a bit in this field. Thus, the number of signals a
version of Linux can support is limited to the number of bits in a word. The
field blocked holds the signals that are being blocked by a process.
There is only one data structure associated with wait queues, the wait_queue
structure. These structures contain a pointer to the associated task_struct, and
are linked into a list.
File locks have an associated file_lock structure. This structure contains a
pointer to a task_struct for the owning process, the file descriptor of the
locked file, a wait queue for processes which are waiting for the cancellation
of the file lock, and which region of the file is locked. The file_lock
structures are linked into a list for each open file.
Pipes, both nameless and named,; are represented by a file system inode. This
inode stores extra pipe-specific information in the pipe_inode_info structure.
This structure contains a wait queue for processes which are blocking on a read
or write, a pointer to the page of memory used as the circular buffer for the
pipe, the amount of data in the pipe, and the number of processes which are
currently reading and writing from/to the pipe.
All system V IPC objects are created in the kernel, and each have associated
access permissions. These access permissions are held in the ipc_perm structure.
Semaphores are represented with the sem structure, which holds the value of the
semaphore and the pid of the process that performed the last operation on the
semaphore. Semaphore arrays are represented by the semid_ds structure, which
holds the access permissions, the time of the last semaphore operation, a
pointer to the first semaphore in the array, and queues on which processes block
when performing semaphore operations. The structure sem_undo is used to create a
list of semaphore operations performed by a process, so that they can all be
undone when the process is killed.
Message queues are based on the msquid_ds structure, which holds management and
control information. This structure stores the following fields:
- Access permissions
- Link fields to implement the message queue (i.e. pointers to msquid_ds)
- Times for the last send, receipt and change
- Queues on which processes block, as described in the previous section
- The current number of bytes in the queue
- The number of messages
- The size of the queue (in bytes)
- The process number of the last sender
The process number of the last receiver.
A message itself is stored in the kernel with a msg structure. This structure
holds a link field, to implement a link list of messages, the type of message,
the address of the message data, and the length of the message.
The shared memory implementation is based on the shmid_ds structure, which, like
the msquid_ds structure, holds management and control information. The structure
contains access control permissions, last attach, detach and change times, pids
of the creator and last process to call an operation for the shared segment,
number of processes to which the shared memory region is attached to, the number
of pages which make up the shared memory region, and a field for page table
The Unix domain sockets are based on the socket data structure, described in the
Network Interface section.
5. Subsystem Structure:
flows from the system call layer down into each module. The System V IPC
facilities are implemented in the ipc directory of the kernel source. The kernel
IPC module refers to IPC facilities implemented within the kernel directory.
Similar conventions hold for the File and Net IPC facilities.
The System V IPC module is dependant on the Kernel IPC mechanism. In particular,
semaphores are implemented with wait queues. All other IPC facilities are
implemented independently of each other.
6. Subsystem Dependencies:
The IPC subsystem depends on the file system for sockets. Sockets use file
descriptors, and once they are opened, they are assigned to an inode. Memory
management depends on IPC as the page swapping routine calls the IPC subsystem
to perform swapping of shared
memory. IPC depends on memory management primarily for the allocation of buffers
and the implementation of shared memory.
Some IPC mechanisms use timers, which are implemented in the process scheduler
subsystem. Process, scheduling relies on signals. For these two reasons, the IPC
and Process Scheduler modules depend on each other.