Skip to content

Commit

Permalink
Resources pages update
Browse files Browse the repository at this point in the history
  • Loading branch information
Nandana S Nair committed Nov 4, 2024
1 parent f87e0a5 commit 43e9282
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 17 deletions.
2 changes: 1 addition & 1 deletion docs/guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The guides feature supplementary documentation intended for your reference as yo
-[TCP/IP Model](/guides/resources/tcp-ip-model)
-[TCP Socket Programming](/guides/resources/tcp-socket-programming)
- 🟡 [UDP Socket Programming](/guides/resources/udp-socket-programming)
- [Process and Threads](/guides/resources/process-and-threads)
- [Process and Threads](/guides/resources/process-and-threads)
-[System Calls](/guides/resources/system-calls)
-[Linux epoll](/guides/resources/introduction-to-linux-epoll)
- 🟡 [Linux epoll tutorial](/guides/resources/linux-epoll-tutorial)
Expand Down
47 changes: 33 additions & 14 deletions docs/guides/resources/process-and-threads.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

An executable program stored on a system’s hard disk typically contains several components, including the text (the executable code), statically defined data, a header, and other auxiliary information such as the [symbol table](https://en.wikipedia.org/wiki/Symbol_table) and string table etc. When the [operating system's loader](https://en.wikipedia.org/wiki/Loader_(computing)) (typically the `exec` system call in Unix/Linux systems) loads the program into memory for execution, a region of memory (logically divided into segments in architectures that support segmentation, such as [x86 systems](https://en.wikipedia.org/wiki/X86)) is allocated to the program. Each of the program parts, such as the program text and static data, is loaded into separate segments of this allocated memory, called the text segment, data segment respectively.

This link provides more information about segmentation in [x86 archiectures](https://en.wikipedia.org/wiki/X86_memory_segmentation).

In addition to the data loaded from the executable file, additional segments are allocated among which the most important one is that for maintaining run time data of the stack and the heap. Unix/Linux systems use the same memory segment for both the stack and the heap, with the stack allocated in the higher memory and the heap allocated in the lower memory of the same segment.

A program loaded into memory for execution is called a “[process](https://en.wikipedia.org/wiki/Process_(computing))” and the memory allocated to the process is called the “[address space](https://en.wikipedia.org/wiki/Address_space)” of the process in OS jargon.
Expand Down Expand Up @@ -44,7 +46,7 @@ The structure and contents of PCB is given below
- **I/O Status Information**: Information on I/O devices allocated to the process, such as open file descriptors, I/O buffers, and pending I/O requests.
- **Memory Management Information**: Information about the memory allocated to the process (e.g., page tables, base and limit registers).

When the process terminates the PCB entry associated with it is typically removed by the operating system. Which includes the de allocation of resources(such as memory, file descriptors, and I/O devices) that were assigned to the process and deletion of the PCB block which contains all the execution details of the process from the process table. This allows the OS to free up the memory and resources occupied by the PCB.
When the process terminates the PCB entry associated with it is typically removed by the operating system. Which includes the de-allocation of resources(such as memory, file descriptors, and I/O devices) that were assigned to the process and deletion of the PCB block which contains all the execution details of the process from the process table. This allows the OS to free up the memory and resources occupied by the PCB.

### **Process Creation**

Expand All @@ -62,37 +64,52 @@ Processes are created through different system calls. The two most popular ones

In the eXpServer project, we want to create server programs that connects to several clients concurrently. One way to handle this is for the server to create (an identical) child process using the `fork()` system call to handle each client request. However, this is very inefficient because creating a new address space and PCB entry, and then copying all the text, data, stack and heap to the new address space for each client connection would slow down the server considerably. Observe that the code for each child process is nearly identical and these concurrent processes can actually share most of the text and data regions as well as files and other resources.

What is needed here is to create a version of `fork()` where the child process shares the PCB entry as well as the address space. While the heap memory is shared by all **threads** within a process, each thread needs a separate stack to be allocated within the same stack/heap segment. The bare minimum “separate data” to be maintained separately for each thread includes the thread’s execution context (values of registers, stack pointer, instruction pointer), local variables, function call stack etc. Each thread is allocated a separate stack within the same stack/heap segment.
What is needed here is to create a version of `fork()` where the child process shares the PCB entry as well as the address space.

## Introduction to Threads

Such a simplified version of `fork()` is provided by threads. Linux/Unix systems allow a process to concurrently execute routines within its code segment as separate threads. To support this kind of “light weight” forking of processes, the notion of “[threads](https://en.wikipedia.org/wiki/Thread_(computing))” where introduced in Unix/Linux system in 1996 and since then most servers implement concurrent code using threads. A collection of system calls are also provided in Linux/Unix systems to support creation, destruction and synchronization between threads.
:::info
A discussion of the system calls that handle threads is given in the [system calls](/guides/resources/system-calls) page.
:::
While the heap memory is shared by all **threads** created by a process, each thread needs a separate stack to be allocated within the same stack/heap segment. The bare minimum “separate data” to be maintained individually for each thread includes the thread’s execution context (values of registers, stack pointer, instruction pointer), local variables, function call stack etc. Each thread is allocated a separate stack within the same stack/heap segment.

To support this kind of “light weight” forking of processes, the notion of “[**threads**](https://en.wikipedia.org/wiki/Thread_(computing))” where introduced in Unix/Linux system in 1996 and since then most servers implement concurrent code using threads. A few years later, it was realized that an even more efficient way to handle concurrent connection requirements in a single server was to avoid creating new processes or threads, but making a single process capable of handling I/O events happening concurrently. The advantage of this approach was that there is no need to maintain separate stack/heap segments or execution contexts for each concurrent connection. [**Linux epoll**](https://en.wikipedia.org/wiki/Epoll) is a mechanism introduced in 2001 to support creation of such server programs and they will be used in this project extensively.
Note that while the address space and resources such as files used by a process will be protected by the operating system from illegal access by other processes which do not have the necessary access permissions, threads within a process share the same address space and open files and OS do not provide any protection between threads within the same process. Thus, the efficiency provided by threads shall be used only when a “trusted code” is running concurrently. For our project, we spawn threads to handle client connections to a server, and in this case the threads contain (our own) trusted code, and hence needs no protection from each other.
:::info
Read more from the wikipedia link on [threads](https://en.wikipedia.org/wiki/Thread_(computing)).
:::

A few years later, it was realized that an even more efficient way to handle concurrent connection requirements in a single server was to avoid creating new processes or threads, but making a single process capable of handling I/O events happening concurrently. The advantage of this approach was that there is no need to maintain separate stack/heap segments or execution contexts for each concurrent connection. [**Linux epoll**](https://en.wikipedia.org/wiki/Epoll) is a mechanism introduced in 2001 to support creation of such server programs and they will be used in this project extensively.

The [Apache server](https://en.wikipedia.org/wiki/Apache_HTTP_Server) (1995) uses threads for handling concurrent client connections whereas [Nginx](https://en.wikipedia.org/wiki/Nginx) (2004) uses epoll.

In the current page, we will describe threads. For a discussion on epoll, see the [epoll documentation](/guides/resources/introduction-to-linux-epoll).
For a discussion on epoll, see the [epoll documentation](/guides/resources/introduction-to-linux-epoll).

In the current page, we will describe threads.

## Threads

As noted previously, each thread within a process shares a common text, data and stack/heap segments with other threads of the same process, but has separate copies of:

1. **Stack**: It holds the thread's local variables, activation records of function calls, and return addresses and dynamic memory. The stack of all threads of a process are maintained within the same segment of the underlying process.
2. **Register Set**: The register set contains the thread's execution context, including the values of CPU registers. These registers include the program counter of each thread, stack pointer of each thread.
3. **Thread-Specific Data**: Thread-specific data allows each thread to maintain its unique state. It can include variables specific to the thread, such as thread ID, state (running/ready/blocked) etc.
1. **Stack**: It holds the thread's local variables, activation records of function calls, and return addresses and dynamic memory. The stack of all threads of a process are maintained within the same stack/heap segment of the underlying process.
2. **Register Set**: The register set contains the thread's execution context, including the values of CPU registers. These registers include the program counter of each thread, stack pointer of each thread. Since the execution context of each thread within a process needs to be maintained separately, a separate OS data structure called [thread control block (TCB)](https://en.wikipedia.org/wiki/Thread_control_block) is maintained for each process to keep meta-data on each thread created by the process. The volume of data to be maintained in the TCB entry of a thread will be much smaller than that of the PCB entry of the process, as data on shared resources such as file pointers are stored in the PCB entry only.
3. **Thread-Specific Data**: Thread-specific data allows each thread to maintain its unique state. It can include variables specific to the thread, such as thread ID, state (running/ready/blocked) etc. Such data is also stored in the TCB entry of the thread.

Multi-threading is the ability of a CPU to allow multiple threads of execution to run concurrently. Multi-threading is different from multiprocessing as multi-threading aims to increase the utilization of a single core. Modern system architectures support both multi-threading and multiprocessing.

![thread.png](/assets/resources/thread.png)

The main Improvements achieved through multi-threading are listed below

**Fast Concurrent Execution** : OS schedules multiple threads in the same process concurrently. Since Switching between threads within a process is much faster than context switch between multiple processes, concurrent servers uses threads for handling multiple connections for faster response to each client.
**Fast Concurrent Execution** : OS schedules multiple threads in the same process concurrently. Since Switching between threads within a process is much faster than context switch between multiple processes, concurrent servers using threads for handling multiple connections will run faster response to each client than servers that spawn a separate process for each client.

**Non-blocking Operations**: One thread can handle blocking input/output operations while other threads can continue executing without blocking.
**Non-blocking Operations**: One thread can handle blocking input/output operations while other threads can continue executing without blocking. This allows a process to make effective utilization of the CPU.

The major thread operations which we use in this project are thread creation and termination using pthread_create() , pthread_exit() and pthread_detach() . Refer the [system calls](/guides/resources/system-calls) page for more details.

### **Thread Control Block**

**Thread Control Block** (**TCB**) is a data structure in an operating system kernel that contains thread-specific information needed to manage the thread. Each thread has a thread control block which consists of:
**Thread Control Block** (**TCB**) is a data structure in an operating system kernel that contains thread-specific information needed to manage the thread. Unix/Linux systems maintain separate TCB for each process to maintain information about threads created by the particular process. Each thread has a TCB entry which consists of:

![tcb.png](/assets/resources/tcb.png)

Expand All @@ -102,8 +119,10 @@ The major thread operations which we use in this project are thread creation and
- **Registers**: The values of CPU registers when the thread is not running. These need to be stored when a thread is preempted or switched out by the scheduler.
- **Stack Pointer**: A pointer to the thread’s stack in memory. Each thread has its own stack to store local variables, return addresses, and function call data.
- **Priority**: It indicates the weight (or priority) of the thread over other threads which helps the thread scheduler to determine which thread should be selected next from the READY queue.
- **Pointer to the Process Control Block (PCB)**: Since multiple threads may belong to the same process, the TCB often contains a reference to the PCB of the process the thread belongs to, allowing shared access to process-wide resources like memory and file descriptors.
- A **pointer** which points to the thread(s) created by this thread.
- **Thread-Specific Data**: Some operating systems allow threads to have local storage that is unique to each thread.
- **Pointer to the Process Control Block (PCB)**: Since multiple threads may belong to the same process, the TCB often contains a reference to the PCB of the process the thread belongs to. Threads access meta data relating to process-wide resources like memory and file descriptors from the corresponding PCB entry.
- A **pointer** to the TCB entries of all threads spawned by the current thread. (This is to enable the thread to access data pertaining to these threads).
- **Thread-Specific Data**: Any other meta-data specific to the thread which the process maintains.

Once a **detached** thread terminates, the TCB and all other resources related to the thread (such as its stack and registers) are **immediately de-allocated** by the operating system.

Once a **detached** thread terminates, the TCB and all other resources related to the thread (such as its stack and registers) are **immediately de allocated** by the operating system.
For programming inteface to threads in Unix/Linux enviornments, see the [system calls](/guides/resources/system-calls) page……………….
4 changes: 2 additions & 2 deletions docs/roadmap/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ The eXpServer project comprises 24 stages, organized into 5 phases. Prior to the
-[Stage 0: Setup](phase-0/stage-0)
-[Stage 1: TCP Server](phase-0/stage-1)
-[Stage 2: TCP Client](phase-0/stage-2)
- [Stage 3: Linux Epoll](phase-0/stage-3)
- 🟡 [Stage 4: UDP with Multi-threading](phase-0/stage-4)
- 🟡 [Stage 3: UDP with Multi-threading](phase-0/stage-3)
- [Stage 4: Linux Epoll](phase-0/stage-4)
-[Stage 5: TCP Proxy](phase-0/stage-5)

### Phase 1: Building the core of eXpServer by creating reusable modules
Expand Down

0 comments on commit 43e9282

Please sign in to comment.