Hongyu Hè

Queuing Theory: Understanding Waiting Lines

2023-02-05T00:00:00+00:00

Many people in academia prefer the term “queueing theory” over “queuing theory.” Here, I use the latter for simplicity.

Queuing theory is a useful concept that deals with the study of queues or waiting lines. Queuing theory has found significant applications in various fields, including computer systems. In computer systems, queuing theory helps in analyzing the performance, predicting the system’s behavior under different load conditions, and optimizing system parameters for better performance. In this blog, we will discuss queuing theory in computer systems and explain some of the essential formulas used in queuing theory.

Basics of Queuing Theory

Queuing theory is a branch of applied mathematics that deals with the study of queues or waiting lines. Queues are a common phenomenon in everyday life, and we encounter them in various situations, such as waiting for a bus, standing in a line at a grocery store, or waiting for a web page to load. In computer systems, queuing theory helps in analyzing the behavior of computer systems under different load conditions.

A queuing system consists of three basic components:

The population of customers who need service
The service facility or server that provides the service
The waiting line or queue where customers wait for service

In a queuing system, customers arrive randomly, and they join the queue if the service facility is busy. When a customer arrives at the service facility, the service facility serves the customer, and the customer leaves the system. The queuing system’s performance can be measured using various metrics, such as the average waiting time, the average queue length, the utilization of the service facility, and the throughput.

Kendall’s Notation

Kendall’s notation is a standard notation used to describe queuing systems. Kendall’s notation uses four letters to describe a queuing system. The first letter represents the arrival process, the second letter represents the service process, the third letter represents the number of servers, and the fourth letter represents the queue discipline.

The following table summarizes the meaning of each letter in Kendall’s notation:

Letter	Meaning
$A$	Arrival process
$M$	Markovian (exponential) service process
$D$	Deterministic service process
$G$	General service process
$M/D/C$	Number of servers ($C$)
$F$	Queue discipline

Formulas in Queuing Theory

The following are some essential formulas used in queuing theory:

Little’s Law

Little’s law states that the average number of customers in a queuing system is equal to the product of the average arrival rate and the average time a customer spends in the system. Mathematically, Little’s law can be expressed as follows:

\[L = \lambda W\]

where $L$ is the average number of customers in the system, $λ$ is the arrival rate, and $W$ is the average time a customer spends in the system.

Erlang’s Formula

Erlang’s formula is used to calculate the probability of a customer having to wait in the queue before receiving service. Erlang’s formula assumes that the arrival process is Poisson, the service process is Markovian, and there is only one server. Mathematically, Erlang’s formula can be expressed as follows:

\[P_n = \frac{\frac{(\lambda/\mu)^n}{n!}}{\sum_{i=0}^{C-1} \frac{(\lambda/\mu)^i}{i!} + \frac{(\lambda/\mu)^C}{C!(1-\rho)}}\]

where $P_n$ is the probability of $n$ customers in the system, $λ$ is the arrival rate, $μ$ is the service rate, $C$ is the number of servers, and $ρ$ is the utilization of the service facility.

Kendall’s Formulas

Kendall’s formulas are used to calculate the performance measures of queuing systems. The following are the most commonly used Kendall’s formulas:

Average queue length:

\[L_q = \frac{\rho^2}{1-\rho} \cdot \frac{1}{1-C\rho^{-C}(1-\rho)^{-1}}\]

where $L_q$ is the average queue length, $ρ$ is the utilization of the service facility, and $C$ is the number of servers.

Average waiting time:

\[W_q = \frac{L_q}{\lambda}\]

where $W_q$ is the average waiting time and $λ$ is the arrival rate.

Utilization:

\[\rho = \frac{\lambda}{C\mu}\]

where $ρ$ is the utilization of the service facility, $λ$ is the arrival rate, $μ$ is the service rate, and $C$ is the number of servers.

Throughput:

\[X = \lambda(1-P_n)\]

where $X$ is the throughput, $λ$ is the arrival rate, and $P_n$ is the probability of $n$ customers in the system.

Applications of Queuing Theory

Queuing theory has many practical applications in various fields. In telecommunications, queuing theory is used to optimize the performance of call centers and customer service systems. By using queuing theory, call centers can determine the optimal number of agents needed to handle customer demand and minimize waiting times. In the healthcare sector, queuing theory is used to manage patient flows and optimize hospital resources. By understanding the behavior of queues, hospitals can reduce waiting times, improve patient satisfaction, and increase the efficiency of their operations.

In manufacturing, queuing theory is used to optimize production lines by minimizing queue lengths and waiting times. By analyzing the arrival and service rates of a production line, queuing theory can help manufacturers determine the optimal number of servers (e.g., machines) needed to meet customer demand and reduce waiting times. Queuing theory is also used in inventory management to determine the optimal inventory level that minimizes costs while meeting customer demand.

Queuing theory is also useful in traffic engineering, where it is used to optimize the performance of traffic systems. By understanding the behavior of queues, traffic engineers can design traffic systems that minimize congestion and waiting times. For example, queuing theory can help traffic engineers determine the optimal number of traffic lanes needed to handle traffic flows during peak hours and minimize waiting times.

Queuing theory has been used in the entertainment industry to predict demand for rides and attractions at theme parks. By analyzing the arrival rate and service rate, queuing theory can help theme parks determine the optimal number of employees needed to reduce queue lengths and waiting times. Queuing theory has also been applied in the aviation industry to optimize the allocation of gates and reduce waiting times at airports.

Limitations of Queuing Theory

While queuing theory is a powerful tool for understanding waiting lines, it has some limitations. To make theoretical analysis feasible, queuing theory sometimes relies on strong assumptions, for example, customers arrive randomly and independently of each other (i.e., input distributions), which may not always be the case in real-world scenarios. For instance, customers may arrive in groups or bunches, and their arrival may be dependent on external factors like weather, time of day, or season. Additionally, queuing theory assumes that the service process is independent of the arrival process, which may not always hold true. In some cases, the arrival of customers may be dependent on the state of the queue or the number of customers present. Finally, queuing theory assumes that the queue discipline is fixed, which may not be the case in real-world scenarios where priorities may change. For example, in a hospital, the priority of patients may change based on their medical condition.

Outlook

Queuing theory is a dynamic field that continues to evolve as new applications and challenges arise. With the advent of big data and machine learning, queuing theory is now being used to optimize complex systems that were previously difficult to model. For example, queuing theory is being used to optimize cloud computing systems, where the arrival rate and service rate can vary significantly depending on the workload. Queuing theory is also being used to optimize supply chain management, where the arrival rate and service rate can vary depending on the demand for goods and services.

Another area of interest for future research in queuing theory is the study of the impact of social distancing measures on queues. The COVID-19 pandemic has brought about significant changes in the way we wait in lines. Queuing theory can be used to study the effectiveness of social distancing measures in reducing queue lengths and waiting times.

Furthermore, queuing theory is being applied to study the impact of customer behavior on queuing systems. It is being used to analyze the effect of customer impatience, customer balk, and jockeying behavior on queueing systems. By understanding the behavior of customers, queuing theory can help organizations design better queue management systems that cater to the needs and preferences of their customers.

Summary

This post gives a brief overview of queuing theory, a methematical concept for understanding waiting lines and improving their performance. By applying queuing theory to real-world problems, we can optimize the performance of queues in various applications, from customer service systems to traffic systems. Queuing theory provides a quantitative framework for evaluating the performance of queues and improving their efficiency, enhancing the experience of customers and users. While queuing theory has its limitations, it remains an essential tool for analyzing and optimizing waiting lines in a variety of fields. As the world becomes more complex, queuing theory will continue to play a critical role in optimizing systems and improving the efficiency of operations.

In addition to the above, queuing theory is also being increasingly applied in the field of e-commerce, where it is used to optimize online shopping experiences. With the exponential growth of online shopping, queuing theory is being used to reduce the waiting times for customers during peak periods, such as Black Friday and Cyber Monday. By analyzing the behavior of online shoppers, queuing theory can help retailers to design better queuing systems that can accommodate the surge in demand during such peak periods.

Another area in which queuing theory is being applied is in the field of public transportation. Queuing theory is used to optimize public transportation systems by minimizing waiting times at bus stops and train stations. By analyzing the arrival and service rates of public transportation systems, queuing theory can help transit agencies to determine the optimal number of buses or trains needed to meet demand and reduce waiting times for passengers.

Queuing theory is also being used to improve the performance of online streaming services. By analyzing the arrival and service rates of streaming services, queuing theory can help streaming providers to determine the optimal number of servers needed to handle the demand for streaming content and reduce buffering times for users.

In conclusion, queuing theory is a powerful tool that has numerous applications in various fields. As the world becomes more complex, the need for efficient queue management systems becomes even more paramount. By understanding the behavior of queues, organizations can optimize their operations, reduce waiting times, and enhance the experience of their customers and users. Queuing theory will continue to play a critical role in optimizing systems and improving the efficiency of operations in various fields.

Moreover, the application of queuing theory has expanded to the field of public health. During the COVID-19 pandemic, queuing theory has been used to model the spread of the virus and to predict the impact of social distancing measures on the spread of the virus. Queuing theory has been used to model the behavior of the virus in different populations and to predict the effectiveness of various interventions, such as lockdowns and vaccination programs. By understanding the behavior of the virus and the effectiveness of interventions, queuing theory can help public health officials to design better policies and strategies to combat the spread of the virus.

In addition to the above-mentioned applications, queuing theory is also being used in the field of finance. Queuing theory is used to optimize the performance of financial markets by minimizing waiting times and reducing transaction costs. By analyzing the arrival and service rates of financial markets, queuing theory can help investors to determine the optimal timing and size of their trades and to minimize their losses due to transaction costs.

Another area of interest for future research in queuing theory is the study of the impact of emerging technologies on queue management systems. With the rapid pace of technological innovation, queuing theory can help organizations to design better queue management systems that can accommodate new technologies such as artificial intelligence, robotics, and the internet of things. By understanding the behavior of queues in the context of emerging technologies, queuing theory can help organizations to optimize their operations and improve the efficiency of their systems.

Reference

Harchol-Balter, M. (2013). Performance modeling and design of computer systems: queueing theory in action. Cambridge University Press.
Gross, D. and Harris, C. M. (1998) Fundamentals of Queuing Theory. John Wiley & Sons, Inc.
Hillier, F. S. and Lieberman, G. J. (2014) Introduction to Operations Research, 10th edition. McGraw-Hill Education.
Kleinrock, L. (1975) Queueing Systems, Volume 1: Theory. John Wiley & Sons, Inc.
Takács, L. (1962) Introduction to the Theory of Queues. Oxford University Press.

A Glimpse of Survival Analysis

2023-01-17T00:00:00+00:00

Survival analysis is a statistical method used to analyze the time until an event of interest occurs. It is used in various fields, including medical research, engineering, and social sciences. In this blog, we will explore the basics of survival analysis, including the definition of survival function, hazard function, and Kaplan-Meier estimator.

Definition of Survival Function

The survival function is a fundamental concept in survival analysis. It describes the probability that an individual survives beyond a certain time. Mathematically, the survival function is defined as:

\[S(t) = P(T > t)\]

where $T$ is the random variable representing the time until the event of interest occurs, and $t$ is a specific time point. The survival function can also be interpreted as the proportion of individuals who survive beyond time $t$.

Hazard Function

The hazard function is another important concept in survival analysis. It describes the instantaneous rate of occurrence of the event of interest at time $t$, given that the individual has survived up to time $t$. The hazard function is defined as:

\[h(t) = \lim_{\Delta t \rightarrow 0} \frac{P(t \leq T < t + \Delta t | T \geq t)}{\Delta t}\]

The hazard function can also be interpreted as the probability of the event of interest occurring in the next infinitesimal time interval, given that the individual has survived up to time $t$.

Kaplan-Meier Estimator

The Kaplan-Meier estimator is a non-parametric method used to estimate the survival function. It is particularly useful when the distribution of survival times is unknown or non-normal. The estimator is based on the observed survival times of a sample of individuals. Let $t_1, t_2, …, t_n$ be the observed survival times, and let $d_1, d_2, …, d_n$ be the corresponding number of events (i.e., deaths) at each time point. The Kaplan-Meier estimator is defined as:

\[\hat{S}(t) = \prod_{i:t_i \leq t} \left( 1 - \frac{d_i}{n_i} \right)\]

where $n_i$ is the number of individuals at risk at time $t_i$. The estimator can be interpreted as the product of the probabilities of survival up to each time point. The denominator in the product is the number of individuals at risk at each time point, which is equal to the total sample size minus the number of events up to that time point.

Log-Rank Test

The log-rank test is a statistical test used to compare the survival distributions of two or more groups. The null hypothesis is that the survival distributions are the same across all groups. The test is based on the difference between the observed and expected number of events in each group, assuming that the null hypothesis is true. The test statistic is given by:

\[Z = \frac{(O_1 - E_1)^2}{V_1} + \frac{(O_2 - E_2)^2}{V_2} + ... + \frac{(O_k - E_k)^2}{V_k}\]

where $O_i$ is the observed number of events in group $i$, $E_i$ is the expected number of events in group $i$ under the null hypothesis, and $V_i$ is the variance of the number of events in group $i$ under the null hypothesis. The test statistic follows a chi-squared distribution with $k-1$ degrees of freedom, where $k$ is the number of groups being compared.

Cox Proportional Hazards Model

The Cox proportional hazards model is a widely used semi-parametric method in survival analysis. It is used to model the relationship between covariates (i.e., explanatory variables) and the hazard function, while assuming that the hazard ratios are constant over time. The hazard ratio represents the relative risk of the event of interest for individuals with a certain value of the covariate, compared to individuals with a reference value of the covariate. The Cox model assumes that the hazard function is given by:

\[h(t, \boldsymbol{X}) = h_0(t) \exp(\beta_1 X_1 + \beta_2 X_2 + ... + \beta_p X_p)\]

where $h_0(t)$ is the baseline hazard function (i.e., the hazard function for individuals with a reference value of all covariates), $\boldsymbol{X}$ is a vector of $p$ covariates, and $\beta_1, \beta_2, …, \beta_p$ are the corresponding regression coefficients. The Cox model does not require any assumptions about the shape of the baseline hazard function, making it more flexible than parametric models.

The likelihood function for the Cox model can be written as:

\[L(\beta_1, \beta_2, ..., \beta_p) = \prod_{i=1}^n \left( \frac{\exp(\beta_1 X_{i1} + \beta_2 X_{i2} + ... + \beta_p X_{ip})}{\sum_{j \in R_i} \exp(\beta_1 X_{j1} + \beta_2 X_{j2} + ... + \beta_p X_{jp})} \right)^{\delta_i}\]

where $X_{i1}, X_{i2}, …, X_{ip}$ are the covariate values for individual $i$, $\delta_i$ is the event indicator for individual $i$ (i.e., $\delta_i=1$ if the event of interest occurs for individual $i$, and $\delta_i=0$ otherwise), and $R_i$ is the set of individuals at risk at the time of event for individual $i$. The Cox model estimates the regression coefficients that maximize the likelihood function.

Summary

Survival analysis is a powerful tool for analyzing time-to-event data. The survival function and hazard function are important concepts in survival analysis, and the Kaplan-Meier estimator is a useful non-parametric method for estimating the survival function. The log-rank test and Cox proportional hazards model are commonly used statistical methods for comparing survival distributions and modeling the relationship between covariates and the hazard function, respectively. By understanding these concepts and methods, researchers can gain valuable insights into the time-to-event outcomes in their data.

Are you Sure your Linux PID is the Process ID?

2023-01-01T00:00:00+00:00

IDs for Processes and Threads in Linux

In Linux, all processes and threads get a unique identifier (ID) and are all listed as directories of the same level under the /proc pseudo file system.
These processes and threads are represented as subdirectories in the form of /proc/[pid] , where the pid is the unique numerical ID for each.
So, technically, the “pid” is not just a process ID but could be an ID for either a thread or a process.
htop shows both processes and threads, and doesn’t distinguish them by default.
- Consequently, its “PID” column contains IDs for both processes and threads.
This dreadful terminology overloading has a historical reason:
- Linux originally didn’t have the notion of threads.
- It only had separate processes, each of which has a unique pid, potentially sharing some resources like virtual memory and file descriptors.
- In 2001, Linux 2.4 introduced “Thread groups”, which gave rise to threads within a process. From the clone(2) man page:
  
  Thread groups were a feature added in Linux 2.4 to support the POSIX threads notion of a set of threads that share a single PID. Internally, this shared PID is the so-called thread group identifier (TGID) for the thread group. Since Linux 2.4, calls to getpid(2) return the TGID of the caller.
  - So, the PID of the parent process is overloaded with another meaning, the TGID :)
- That’s said, the kernel still doesn’t have a separate implementation for processes and threads.
In summary, threads within a process will get the same PID , while each of which has a unique thread ID (TID).
- The values returned by the getpid() are the same for all of them.
- But the TID from the gettid() are always unique.

How to Distinguish between a Thread and a real “Process”?

As mentioned above, threads and process are similar in nature (kernel implementation), and both are “schedulable” entities to the kernel.
The main difference between them is the scope of their namespaces (address space, resources, etc.).
To distinguish between threads and processes, we need to look into the /proc/[pid]/task/[tid] subdirectories where tid is the kernel thread ID.
Kernel threads are the ones registered in and scheduled by the kernel.
- There are user-level threads (e.g., Java threads) invisible to the kernel and managed by the application layer.
- See here for more.
The threads within parent process are under the same thread group (TG) umbrella, having the same TGID as the main thread but different tids.
- The main thread is the “process” that spawned all the children threads and has the same TGID as its PID.
/proc/[pid]/task/[tid] shares the same content as /proc/[pid]/ if pid==tid , i.e., it contains the same information describing the same process/thread.
Therefore, when you look into the /proc/[pid] directory of a multithreaded process:
- The task/[tid] subdirectories are all the threads within the same thread group.
- They have the same TGID being the pid.
- The so-called process is the main thread that spawned all other threads.
- The main thread has the same TGID as its PID (usually the smallest one under the task/ directory).

Last Note: Multiprocessing vs. Multithreading

To avoid confusion, apart from the thread group ID (TGID), there is also a process group ID (PGID).
The former is for multithreading, and the later is used in multiprocessing.
A process is spawned by invoking the fork() syscall, a thread is created by e.g., pthread_create() in C. (Under the hood, they all use the syscall clone() but with different parameters)
- The process that spawns other subprocesses is the parent.
  - When spawning a new process, the parent can either create a new process group (and puts itself and its children into it) or inherits that of an existing process (usually it’s the grandparent process).
  - If a new process group is created, the PGID will be the PID of the process that creates it.
  - If it’s inherited, the PGID is the same as that of the grandparent.
- The process (the main thread) and all the threads it creates are siblings.
  - They share the same TGID.
  - They also have the same PGID as that of the main thread.
  - NB: The threads created by a process are NOT visible to its parent process.

Example

We easily can launch a multithreaded program with stress-ng on Linux.
A “stressor/hog” is a process.
We can run the stress test for memory accesses with the following command, which will spawn a stressor that runs with 5 threads reading and writing to two different mappings of the same underlying physical page.
```
 hy@node-0:~$ stress-ng --mcontend 1 -t 10h
 stress-ng: info:  [56472] dispatching hogs: 1 mcontend
```
With htop, we can see the process and threads therein in hierarchy (the PGRP is the GPID, and the PID is the ID for threads/processes).
- We can see that all the threads and processes have a unique ID, PID (process/thread ID).
  - 56472: The single-threaded parent process spawned from the bash command.
  - 56473: The multithreaded child process (and the main thread) spawned from the parent 56472.
  - 56473-56477: The 5 sibling threads created by the main thread 56473.
Then, by using pidof, we get the pid of the main thread.
```
 hy@node-0:~$ pidof stress-ng-mcontend
 56473
```
Since the bash command 56472 is the parent process that spawned the child process 56473, we can examine this relationship by checking:
```
 hy@node-0:~$ cat /proc/56472/task/56472/children
 56473
```

Then, navigate to the /proc directory and check the /proc/[pid]/task/ subdirectories. We get the 5 threads within this process:

 hy@node-0:~$ ll /proc/56473/task/
 total 0
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 ./
 dr-xr-xr-x 9 hy hy 0 Dec 31 22:21 ../
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56473/
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56474/
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56475/
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56476/
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56477/

To examine directory of a child thread, we get the same output as above since they are siblings.

 hy@node-0:~$ ll /proc/56476/task/
 total 0
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 ./
 dr-xr-xr-x 9 hy hy 0 Dec 31 22:21 ../
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56473/
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56474/
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56475/
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56476/
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56477/

Note that the parent process 56472 is a single-threaded process, so its /task directory contains only itself.

 hy@node-0:~$ ll /proc/56472/task/
 total 0
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 ./
 dr-xr-xr-x 9 hy hy 0 Dec 31 22:21 ../
 dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56472/

Resource Accounting

The resource usage of a process/thread can be obtained in various ways, and the information is not consistent.
From above, htop aggregate all the resource usages in the main thread 56473 by default.

We can obtain per-process information from ps as well:

Like htop, It aggregates usage of all threads to the main threads by default.

  hy@node-0:~$ ps -p 56473 -o %cpu,%mem,cmd
  %CPU %MEM CMD
   473  0.0 stress-ng-mcontend

It only works on the main thread but not with the siblings:

  hy@node-0:~$ ps -p 56476 -o %cpu,%mem,cmd
  %CPU %MEM CMD

To see detailed thread-level information, we can use the -L flag on the main thread:

  hy@node-0:~$ ps -L 56473 -o %cpu,%mem,cmd
  %CPU %MEM CMD
3  0.0 stress-ng-mcontend
0  0.0 stress-ng-mcontend
0  0.0 stress-ng-mcontend
0  0.0 stress-ng-mcontend
0  0.0 stress-ng-mcontend

With -F option, we can obtain the full glory:

  hy@node-0:~$ ps -L 56473 -F
  UID          PID    PPID     LWP  C NLWP    SZ   RSS PSR STIME TTY      STAT   TIME CMD
  hy         56473   56472   56473 97    5 22792  2604  13 08:10 pts/2    RLl+ 302:30 stress-ng-mcontend
  hy         56473   56472   56474 94    5 22792  2604   7 08:10 pts/2    RLl+ 292:03 stress-ng-mcontend
  hy         56473   56472   56475 94    5 22792  2604  31 08:10 pts/2    RLl+ 292:00 stress-ng-mcontend
  hy         56473   56472   56476 94    5 22792  2604  15 08:10 pts/2    RLl+ 291:59 stress-ng-mcontend
  hy         56473   56472   56477 94    5 22792  2604   0 08:10 pts/2    RLl+ 292:05 stress-ng-mcontend

Note that ALL threads share the same PID but each of them has a unique TID (LWP).

We can monitor those threads using top with -H:

 hy@node-0:/proc$ top -H -p 56476
 ....
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   56473 hy        20   0   91168   2708   2272 R  97.3   0.0 127:24.91 stress-ng-mcont
   56474 hy        20   0   91168   2708   2272 R  94.0   0.0 122:55.16 stress-ng-mcont
   56475 hy        20   0   91168   2708   2272 R  94.0   0.0 122:54.44 stress-ng-mcont
   56476 hy        20   0   91168   2708   2272 R  93.7   0.0 122:55.33 stress-ng-mcont
   56477 hy        20   0   91168   2708   2272 R  92.3   0.0 122:56.57 stress-ng-mcont

Without the -H flag, however, it aggregates all usages to any of the sibling threads:

  hy@node-0:~$ top -p 56476
  ....
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    56473 hy        20   0   91168   2708   2272 R 476.3   0.0 621:39.36 stress-ng-mcont

Also, we can get a single number out:

  # Get the CPU utilization of thread 56476.
  # * Aggregated.
  hy@node-0:~$ top -b -n 2 -d 0.2 -p 56476 | tail -1 | awk '{print $9}'
  465.0
  # * Per-thread with -H.
  hy@node-0:~$ top -b -H -n 2 -d 0.2 -p 56476 | tail -1 | awk '{print $9}'
  75.0

How to get per-thread resource usages? We need to again look into the /proc sysfs.

The file /proc/pid/stat of a thread (whose TID==pid) contains the information aggregated from all threads.

The file /proc/pid/task/pid/stat contains per-thread information:

  # * Get total cpu time (user and kernel) of all threads belonging to the same TG as that of 56476.
  hy@node-0:~$ cat /proc/56476/stat | awk '{print $14, $15}'
  9460932 12361
        
  # * Get the cpu time for only thread 56476.
  hy@node-0:~$ cat /proc/56476/task/56476/stat | awk '{print $14, $15}'
  1879429 3032

Alternatively, we can use the mighty python with psutil.

 >>> import psutil
    
 # The great grandparent process.
 >>> tmux_session = psutil.Process(54711)
 # * It's spawned from the mother of all processes of PID=1 -- the init(old distros)/systemd(new distros).
 >>> tmux_session.ppid()
 1
 >>> [(child.name(), child.pid) for child in tmux_session.children(recursive=True)]
 [('bash', 54712), ('bash', 56236), ('python', 56613), ('stress-ng', 56472), ('stress-ng-mcontend', 56473)]
    
 # The parent process.
 >>> parent = psutil.Process(56472)
 # * The parent was spawned from one of the above bash sessions (grandparent)
 >>> parent.ppid()
 54712
 # * The sibling threads art NOT children, and invisable to the parent process.
 >>> parent.children(recursive=True)
 [psutil.Process(pid=56473, name='stress-ng-mcontend', status='running', started='11:21:57')]
 # * The parent is single-threaded.
 >>> parent.num_threads()
 1
    
 # The child process spawned from the parent process.
 >>> child = psutil.Process(56473)
 # * The child created 5 threads (including itself).
 >>> child.num_threads()
 5
 >>> [thread.id for thread in child.threads()]
 [56473, 56474, 56475, 56476, 56477]
    
 # One of the sibling threads.
 >>> sibling = psutil.Process(56476)
 # * The sibling thread inherits the parent process of the main thread.
 >>> child.ppid()
 56472
 >>> sibling.ppid()
 56472
    
 # Accounting resouces.
 # * The parent process is in sleep state (S), so it doesn't take any CPU time.
 >>> parent.cpu_percent(interval=1)
 0.0
 # ! psutil **aggregates** all sibling resources to **any** of the siblings.
 >>> sibling.cpu_percent(interval=1)
 471.5
 >>> child.cpu_percent(interval=1)
 472.4
 # ! Also, its cpu time accounting for children processes is broken somehow ...
 >>> tmux_session.cpu_times()
 pcputimes(user=7.46, system=3.19, children_user=102.18, children_system=153.15, iowait=0.0)
 >>> parent.cpu_times()
 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0, iowait=0.0)        
 >>> child.cpu_times()                                                                      
 pcputimes(user=45250.11, system=57.79, children_user=0.0, children_system=0.0, iowait=0.0) 
 >>> sibling.cpu_times()                                                                    
 pcputimes(user=45255.42, system=57.79, children_user=0.0, children_system=0.0, iowait=0.0) 
    
 # * Memory usages are accounted the same as it does for CPU.
 >>> parent.memory_full_info()
 pfullmem(rss=6475776, vms=59777024, shared=6078464, text=1728512, lib=0, data=32018432, dirty=0, uss=3051520, pss=3749888, swap=0)
 >>> child.memory_full_info()
 pfullmem(rss=2772992, vms=93356032, shared=2326528, text=1728512, lib=0, data=65581056, dirty=0, uss=126976, pss=735232, swap=0)
 >>> sibling.memory_full_info()
 pfullmem(rss=2772992, vms=93356032, shared=2326528, text=1728512, lib=0, data=65581056, dirty=0, uss=126976, pss=735232, swap=0)
 >>> tmux_session.memory_percent()
 0.007239506814671662
 >>> parent.memory_percent()
 0.009602063988251591
 >>> child.memory_percent()
 0.004111699759675096
 >>> sibling.memory_percent()
 0.004111699759675096

In summary, there are various ways of accounting resources for processes and threads on Linux.
- However, there are nuances that must be taken into account.
- psutil is a convenient tool for sys admins when scripting in python.
  - However, the resource usage of any threads is the aggregated values of all threads!
  - It doesn’t explicitly distinguish between PIDs and TIDs either.
- And it’s time to end our running example:
```
  # ! "Note this will return True also if the process is a zombie (p.status() == psutil.STATUS_ZOMBIE)"
  >>> parent.is_running() == child.is_running() == sibling.is_running() == True
  True
        
  >>> import signal
  >>> sibling.send_signal(signal.SIGINT)
        
  >>> parent.is_running() == child.is_running() == sibling.is_running() == False
  True
```
  (It seems that interrupting one thread has a bottom-up cascading effect in stress-ng 🥴 )

Happy New Year 🎆 ~

Reference

Connecting Bayesian with Regularization

2022-01-03T00:00:00+00:00

Bias-Variance Trade-off

In many cases, it’s desired to trade off a bit larger bias for a much smaller variance for better estimations of the generalisation error. One way to do so is by adding a regularizer. In the case of linear regression, L1 (Lasso) and L2 (Ridge) regularization are two of the most common ones. Lasso suppresses weights of small magnitudes to zero, making the feature space sparse, whilst Ridge “condenses” all weights to smaller values. Both can restrict the norm of the weights and therefore, mitigate overfitting.

However, regularization is only one way to strike the balance. Another way is to introduce more bias into the equation through the Bayesian lens. Specifically, we can impose prior knowledge by adding a prior distribution to constrain the norm of the leant parameters. For example, if we know the weights are small and centred, then we can set our prior to be $\vec{w} \sim \mathcal{N}(0, \beta\textbf{I})$. Then, by Bayes rule, we have:

\[\mathbb{P}(\vec{w} | X, \vec{y}) = \cfrac{\mathbb{P}(\vec{w}, X, \vec{y})}{\mathbb{P}(X, \vec{y})} = \cfrac{\mathbb{P}(\vec{w}, \vec{y} | X) {\mathbb{P}(X)}}{\mathbb{P}(\vec{y} | X) {\mathbb{P}(X)}} = \cfrac{\mathbb{P}(\vec{w}, \vec{y} | X)}{\mathbb{P}(\vec{y} | X)}\]

Thus, both regularization and Bayesian modelling can achieve the same goal, which begs the question: are they connected?

Laplace is to Lasso as Gaussian is to Ridge

The answer to the above question turns out to be yes! To illustrate this further, let’s use two common prior distributions, Lapace and Gaussian, as our running examples.

Firstly, we assume the following general setting for regression: ${y} = X{\theta}$ and $f_X = y + \epsilon$ where $\theta \sim \text{Laplace}(0, s) = 1/2s \cdot\exp(-\mid\theta\mid / s)$ and $\epsilon \sim \mathcal{N}(0, \delta^2_\epsilon)$.

Then, we obtain the maximum a posteriori (MAP) esitmation as:

\[\begin{align} \arg\max_\theta\mathbb{P}({\theta} | X, {y}) &= \arg\max_\theta\cfrac{\mathbb{P}(y | X, \theta)\mathbb{P}(\theta)} {\mathbb{P}(y)} \nonumber\\ &\propto \arg\max_\theta\mathbb{P}(y |X, \theta)\mathbb{P}(X| \theta)\mathbb{P}(\theta) \nonumber\\ &\propto \arg\max_\theta\mathbb{P}(y | X, \theta)\mathbb{P}(\theta) \nonumber\\ &\propto \arg\max_\theta\mathbb{P}(\theta) \prod^n_i \mathbb{P}(y_i | X_i, \theta) \nonumber\\ &\propto \arg\min_\theta -\log \mathbb{P}(\theta) - \sum_i^n \log \mathbb{P}_\theta(y_i | X_i) \end{align}\]

Next, we can substitute both the likelihood and prior into Eq. (1).

\[\begin{align} \arg\min_\theta -\log\cfrac{1}{2s} \exp\left\{-\cfrac{|\theta|}{s}\right\} - \sum^n_i \log \cfrac{1}{Z} \exp\left\{-\cfrac{1}{2}\left(\cfrac{y_i - f_i}{\delta_\epsilon}\right)^2\right\} \end{align}\]

where $Z$ is the Gaussian normalising constant. By simplifying Eq. (2), we obtain the following form:

\[\begin{align} & \arg\min_\theta - \cfrac{|\theta|}{s} + \cfrac{1}{2\delta^2_\epsilon} \sum^n_i(y_i - f_i)^2 \\ =& \arg\min_\theta - \sum^n_i(y_i - f_i)^2 - \cfrac{2\delta^2_\epsilon}{s}||\theta||_1 \end{align}\]

Now, we have recovered the exact form of Lasso, where $\cfrac{2\delta^2_\epsilon}{s}$ is the coefficient of the L1 regularizor $\lambda$ that controls the strength of the constraint.

Next, let’s play the same trick in the same setting but with a Gaussian prior instead, i.e., $\theta \sim \mathcal{N}(0, \delta_\theta^2)$.

Starting from Eq. (1), we subsitute in the likelihood and prior as above:

\[\begin{align} \arg\max_\theta\mathbb{P}({\theta} | X, {y}) &\propto \arg\min_\theta -\log \mathbb{P}(\theta) - \sum_i^n \log \mathbb{P}_\theta(y_i | X_i) \nonumber \\ &\propto \arg\min_\theta -\log \cfrac{1}{Z'} \exp\left\{-\cfrac{1}{2}\left(\cfrac{\theta-0}{\delta_\theta}\right)^2 \right\} \\ & \quad - \sum^n_i \log \cfrac{1}{Z} \exp\left\{-\cfrac{1}{2}\left(\cfrac{y_i - f_i}{\delta_\epsilon}\right)^2\right\} \nonumber \end{align}\]

Finally, letting go all the fluff in Eq. (5), we have:

\[\begin{align} & \arg\min_\theta - \cfrac{||\theta||_2^2}{2\delta^2_\theta} + \cfrac{1}{2\delta^2_\epsilon} \sum^n_i(y_i - f_i)^2 \nonumber\\ =& \arg\min_\theta - \sum^n_i(y_i - f_i)^2 - \cfrac{\delta^2_\epsilon}{\delta^2_\theta}||\theta||_2^2 \end{align}\]

By Eq. (6), we have recovered Ridge regression where the fraction $\cfrac{\delta_{\epsilon}^2}{\delta_{\theta}^2}$ denotes regularization constant $\lambda$.

Summary

By working out the above two examples, we found that regularised regression is nothing but Bayesian modelling in disguise. In fact, imposing various priors has the same effect as using corresponding regularizers. By the same token, choosing different likelihoods gives us different loss functions. In this post, we used Gaussian likelihood in both examples, and, in turn, recovered the square loss.

There are many other options for prior and likelihood functions. For instance, one can use a student-t as opposed to Gaussian. Lastly, a family of conjugate prior can drastically reduce the cost of Bayesian inference.

Good Reads

2021-10-08T00:00:00+00:00

This’s also a rolling log in which I list some books recently read (2021) that are right up to my alley.

Bennett, Arnold. Self and self-management: Essays about existing. George H. Doran Company, 1918.

The true stories behind the stories of success shall be the same and might not be as glorious.

Hawking, Stephen. The theory of everything. Jaico Publishing House, 2006.

Well, I’m now under the impression that Computer Science is a pseudoscience :}

Hall, Herbert James. The untroubled mind. Houghton Mifflin, 1915.

How can I live out a life so fully that worries couldn’t sneak in? Perhaps most importantly, what’s my deeper justification and higher pursuit thereof?

Jones, Diana Wynne, John Sessions, and Stella Paskins. Howl’s moving castle. Recorded Books, 2008.

I finally understand why I didn’t quite understand the movie 😅

My Take on TED Talks

2021-08-19T00:00:00+00:00

For the past several months, I’ve picked up the habit of watching TED talks. However, I found that, even if it’s an impressive speech with resonating ideas, it won’t take long before it vanishes from my memory. This fretted me, so I decided to start briefly jotting down my main takeaways in this rolling log.

Some of my friends from China saw a bunch of blanks here. This is because YouTube videos can’t pass the firewall 🧱.

10 ways to have a better conversation

If you want to pontificate, go write a blog.
Ask open-ended questions.
“No man ever listened his way out of a job” — Calvin Coolidge
Most of us don’t listen with the intent to understand. We listen with the intent to reply.
Listen, and be prepared to be amazed.
If you don’t know, say that you don’t know.
Don’t equate your experience with others.

Self-control

This Ted talk conveys very similar messages as that of the below one. Their main ideas are covered in many other talks as well.

To reach beyond your limits by training your mind

Self-control is the problem where we have all these desires from ourselves for the long-term, but then in the short-term, we do rather different things (that prevent ourselves from achieving long-term those goals).
Our will power is weak, and therefore, it’s not something upon which we should rely during decision-making.
If we are faced with temptation whilst having no tool at hand to overcome it, we’re almost certainly going to fail.
We should collaborate with our brains with constructive messages.
- Change the pictures and the words. Using very detailed words.
- Tell your mind exactly what you want.
We ought to create tools that will control our future selves to do what our current selves want them to do.
- It’s a situation where we know we will be tempted, and we do something to make ourselves not be able to be tempted.
Make the familiar unfamiliar and the unfamiliar familiar.
Reward-substitution: connecting pain to pleasure
- Link massive pleasure to going there and pain to not going there.
- Do the right thing for the wrong reason.
Make Self Belief so normal to you that everyone believes in you too.

The last point ties in with another Ted talk, which will be introduced later.

Body language, the power is in the palm of your hands

The palm up position is friendly and inviting, whilst the palm down position exerts power and control over others.
- Finger pointing is the worst; it is directive and rude.

Imposters: The psychology of pretending to be someone you’re not

Some imposters are escapologists — they’re running away from a flawed past and trying to rehabilitate their image.
We spend a hell of a lot of time trying to impress other people.
Ultimately, life is a performance with all the sense of drama, anarchy, and possibility.
Who ever controls the past controls the future; who ever controls the present controls the past. — Orwell

How to draw to remember more

Thinking in pictures and then drawing them down is a great way to learn and memorize new things.
There are a myriad of ways to represent an abstract concept using drawings.
The quality of the drawing does not matter at all. In other words, good and bad pictures have rather similar, if not the same, effect on the learning process.
It is the process of doing the drawing that actually makes a difference.

To be continued 👨‍💻 …

SSH Proxy Jump

2020-08-11T00:00:00+00:00

This post is concerned with the basics of SSH authentication, as well as its indirect login via a proxy server.

Step 1: key pair generation

Use the following command to generate a public(silver)/private(black) RSA key pair under the ~/.ssh/id_rsa directory. The .ssh/id_rsa is the private key that you keep in your machine, and the .ssh/id_rsa.pub is the public key that you distribute to other machines/platforms in order to achieve automatic login via key-pair verification.

ssh-keygen -t rsa

Note that git uses a different type of cryptosystem, namely, the Ed25519 system, which can be generated using the following command.

ssh-keygen -t ed25519 -C "youremail@yourdomain"

Compared to RSA, it is considered to be faster, safer and more compact (Ed25519: 8chars, RSA: 544chars) although RSA is more commonly used.

Step 2: distribute key(s)

If Alice wants to log in Server1 shown in the figure, she can run the following command to forward her SSH public key(s) to it. The keys sent will be recorded in .ssh/authorized_keys of the host.

ssh-copy-id alice-username@server1.domain-or-ip

Afterwards, Alice should be able to log in Server1 without being asked for her password. I.e.,

ssh alice-username@server1.domain-or-ip

Welcome to XXX ...

Note that for Mac users that do not have ssh-copy-id, you can either intall it via brew or mannually copy these ssh files through scp, rsync or whatnot. If you go for the latter, one thing you should keep in mind is to set the permission bits correctly as shown below.

chmod 700 ~/.ssh
chmod 600 ~/.ssh/*

In the case that Server1 has a firewall or what have you, Alice has to connect it via a proxy, say Server2; therein lies the question: how to access Server2 directly using key-pair authentication?

To tackle this, Alice can first forward her/his keys to Server2, the proxy, through Step 2. Next, Alice should log in Server2 to generate a key pair (Step 1), and then, send keys to Server1.

Last, but certainly not least, Alice should set up her ssh on her local machine. The configuration (~/.ssh/config) is along the lines of the following.

Host server2
    HostName          server1-domain
    User              alice-username
    IdentityFile      ~/.ssh/id_rsa

Host server1
    HostName          server1-domain
    User              alice-username
    ForwardX11Trusted yes
    ForwardAgent      yes
    IdentityFile      ~/.ssh/id_rsa
    ProxyCommand ssh server2 -W %h:%p

Now, everything should be in place. Alice should be able to do login, port forwarding, or whatnot, with automatic key-pair authentication (without having to type her password every single time for every single server along the way!).

🛑 NB: Data loads between Server1 and Server2 is not encrypted.

# Log in *Server1* via *Server2*.
ssh server1 

# Tunnelling ports to *Server1* via *Server2*.
ssh -NL {listening_port}:{hostmachine}:{host_port} server1 

p.s. Wrestling with company proxies during these COVID times we live in at the moment can be devastating 😷

Autoencoders and VAEs

2020-02-15T00:00:00+00:00

1 Autoencoders

The main idea of autoencoders is to extract latent features that are not easily observable yet play an important role in one or several aspects of the data (e.g., images).

Embedding of faces [Saul & Roweis]

1.1 Compression (by the “encoder”)

The first step of the process is to compress the observed data vector $\vec x$ into the latent feature vector $\vec z$.

There are two obvious benefits yielded from such compression process:

The latent feature vector $\vec z$ is much smaller in size, which makes them much easier to process compared to the original (potentially high dimensional) data.
As its name suggests, the latent feature vector $\vec z$ may capture important hidden features.

1.2 Reconstruction (by the “decoder”)

The second phase is to try to reproduce the data (the image) from the latent feature vector $\vec z$.

Apparently, since the first step is a “lossy compression”, the data reconstructed $\vec{\hat{x}}$ will not be exactly the same as the original observation. Here is where the third phase comes about.

1.3 Backpropagation

As mentioned above, there is a difference between the observation $\vec{x}$ and the reconstruction $\vec{\hat{x}}$.

From the above picture, we can see clearly that the higher the dimension of the latent feature vector $\vec{z}$, the higher the quality of the reconstruction.

Therefore, constraining the size of the latent space will enforce the “importance” of the extracted features.

Further, we can use a loss function to measure such “importance” of the extracted hidden variables. In this case, we use a simple square loss:

\[\mathcal{L}(x, \hat{x})=\|x-\hat{x}\|^{2}\]

Thus, the key power of autoencoders is that

Autoencoder allows us to quantify the latent variables without labels (gold-standard data)!

To summarize,

Autoencoding == Automatically encoding data
Bottleneck hidden layer forces the network to learn a compressed latent representation.
Reconstruction loss forces the latent representation be as “paramount” and “informative” as possible.

2 Variational Autoencoders (VAEs)

2.1 Stochastical variation

In a nutshell, variational autoencoders are a probabilistic twist on autoencoders, i.e. (stochastically) sample from the mean and standard deviation to compute the latent sample as supposed to deterministically take the entire latent vector $\vec{z}$. That been said, the main idea of the forward propagation does not change compared to traditional autoencoders.

In the compression process, the encoder computes $p_{\phi}(\mathrm{z} \mid x)$.
In the reconstruction phase, the decoder computes $q_{\theta}(\mathrm{x} \mid z)$.

Then, we could compute the loss as follows

\[\mathcal{L}(\phi, \theta, x)=(\text { reconstruction loss })+(\text { regularization term }),\]

which is exactly the same as before. It captures the pixel-wise difference between the input and the reconstructed output. This is a metrics of how well the network is doing at generating the distribution that akin to that of the observation.

As to the “regularization term”, since the VAE is producing these probability distributions, we want to place some constraints on how they are computed as well as what that probability distribution resembles as a part of regularizing and training the network.

Hence, we place a prior $p(z)$ on the latent distribution as follows

\[D(p_{\phi}(z|x)\ ||\ p(z)),\]

which captures the KL divergence between the inferred latent distribution and this fixed prior for which a common choice is a normal Gaussian, i.e. we centre it around with a mean of 0 and a standard deviation 1: $\ p(z)=\mathcal{N}\left(\mu=0, \sigma^{2}=1\right)$.

In this way, the network will learn to penalise itself when it tries to cheat and cluster points outside sort of this smooth Gaussian distribution as it would be the case if it was overfitting or trying to memorize particular instances of the input.

Thus, this will enforce the extracted $\vec z$ follows the shape of our initial hypothesis about the distribution, smoothing out the latent space and, in turn, helping the network not over-fit on certain parts of the latent space.

2.2 Backpropagation? Reparametrization

Original form

Unfortunately, due to the stochastic nature, the backpropagation cannot pass the sampling layer as backpropagation requires deterministic nodes to be able to iteratively pass gradients and apply the chain rule through.

Reparametrized form

Instead, we consider the sampled latent vector $\vec z$ as a sum of a fixed vector $\vec \mu$ a fixed variance vector $\vec \sigma$ and then scaled this variance vector by a random constant that is drawn from a prior distribution, for example from a normal Gaussian. The key idea here is that we still have a stochastic node but since we have done this reparametrization with the factor $\epsilon$ that is drawn from a normal distribution, this stochastic sampling does not occur directly in the bottleneck layer of $\vec z$. This way, we can reparametrize where that sampling is occurring.

Note that this is a really powerful trick as such reparametrization is what allows for VAEs to be trained end-to-end.

3 Code example

The following is a vanila implementation of a VAE model in Tensorflow.

class Sampling(keras.layers.Layer):
  def call(self, inputs) :
  z_mean, z_log_var = inputs
  batch = tf.shape(z_mean)[0]
  dim = tf.shape(z_mean)[1]
  epsilon = tf.keras.backend.random_normal_(shape=(batch, dim))
  return z_mean + tf.exp(0.5 * z_log_var)

latenet_dim = 2
encoder_inputs = Input(shape=(6), name="input_layer")

X = Dense (5, activation="relu", name="h1")(encoder_inputs)
X = Dense (5, activation="relu", name="h2")(x)
X = Dense (4, activation="relu", name="h3")(x)
z_mean = Dense(latent_dim, name="z_mean")(x)
z_log_var = Dense(latent_dim, name="z_log_var")(x)
z = Sampling()([z_mean, z_log_var])

encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name="encoder")

keras.utils.plot_model(encoder, show_shapes=True)

Hey system call! Where on earth are you?

2019-09-05T00:00:00+00:00

We have been talking about system call for a while, e.g. applications request services from the operating system via syscall, but we have never ever seen a system call!

1. Where are they? — Layering

First, let’s recall the UNIX system structure:

System call is an interface between the user and kernel mode which is not necessarily the interface that you want to give a programmer for security concerns.

Consequently, system call is buried in the programming language run time library (e.g. libc.a) so it is the C library that actually makes the system calls to the operating system for us.

2. The charming “narrow waist” — The power of uniformed API

System calls show their power when we are dealing with multiple different devices. When their syscall are similar enough, mounting them become easy.

// It is said that the way Linux deals with it is to encompass every system call under the sun from all kinds of different operating systems. Terrific ~

Normally, we will get a different chunk of data reading from different devices but, due to the virtue of uniformity, we are able to read from and write to disk driver in exactly the same way as we read from and write to a flash memory. This is because the interface of the kernel is byte-oriented, which means it is reading and writing bytes so it doesn’t care the size of the data blocks.

3. Final notes: kernel-buffered `read` && `write`

Recall the six key design concepts of UNIX I/O

Uniformity
- File operations, device I/O, and interprocess communication through open, read/write and close.
- Allow simple composition of programs, for example, find | grep | wc ….
Open before use
- Provides opportunity for access control and arbitration.
- Sets up the underlying machinery, i.e., data structures.
Byte-oriented
- Even if data blocks are transferred, addressing is in bytes.
Kernel buffered reads
- Streaming and block devices look the same.
- Read blocks processes, yielding processor to other task.
Kernel buffered writes
- Completion of out-going transfer decoupled from the application, allowing it to continue.
Explicit close

As we discussed in the training session, reading something off a disk is time-consuming and costly, often up to several milli-seconds, which is roughly equal to a million instruction times. Thus, in order not to lose a million instructions, we better put the corresponding processes into sleep yielding processors to other tasks.

Same for writing, when the system call write returns, the data is not necessarily on the disk but buffered in the memory (the kernel), allowing the applications to keep going.

In a nutshell, the kernel is doing tons of buffering and visualization behind the scenes.

(In other words, if your machine crashes at a wrong point in time, you will lose your data permanently …)

p.s. I’ll (hopefully) get to the user-buffered I/O later.

fork( ) yourself!

2019-09-03T00:00:00+00:00

1. Recall

As we mentioned before, a process is an instance of a program executing. Its state and information are all monitored by the OS. Processes can do their work by invoking system calls.

But is there any operations that are on their own?

Yes, process can create a new process (sometimes called sub-process) by copying itself!

2. `fork()`

In typical UNIX systems (exclude Linux in a sense that it may somehow augment the child process in the first place), the fork() system call (or library precisely) creates a copy of the callee process.
When I say copy, I mean all of the states of the original process duplicated in both the parent and the child! (Memory, File Descriptors, etc…)

fork() returns 3 kinds of values:
- ` 0 ` represents that this process is the parent;
- ` 1 ` represents that this is process is the child;
- -1 (as UNIX convention) represents an error message.
Why we need those return values?
- when we call fork(), the original process will be trapped into the kernel mode and halt until it returns.
- when it returns, the kernel will actually return twice, once in the parent process and once in the child.
- the parent and the child starts from the instruction after the fork() together.

An example:

#include 
#include 
#include 
#include 
#include 
#define BUFSIZE 1024
int main(int argc, char *argv[])
{
    char buf[BUFSIZE];
    size_t readlen, writelen, slen;
    pid_t cpid, mypid;
    pid_t pid = getpid(); /* get current processes PID */
    printf("Parent pid: %d\n", pid);
    cpid = fork();
    if (cpid > 0) { 
    /* Parent Process */
        mypid = getpid();
        printf("[%d] parent of [%d]\n", mypid, cpid);
    } 
    else if (cpid == 0) { 
    /* Child Process */
        mypid = getpid();
        printf("[%d] child\n", mypid);
    }
    else {
        perror("Fork failed");
        exit(1);
    }
    exit(0);
}

3. The relationship between parent and child processes

fork() is executed in an unblocked manner, which means the parent process will not naturally sit there and wait for their child processes to return.
Furthermore, they are actually running in parallel that both of them are exchanging time on the scheduler queue and the run queue.
The parent processes are able to control their children processes directly.
What if the child gets “killed” right away before its parent does anything?
- Basically, when this happens, the PCB of the child will not be reallocated by the OS;
- Instead, these dead children will stay in a state called the Zombie State. (When you mess up the so-called interprocess communication between parent and children process, there will be a bunch of “Zombie” appear)
What if the parent gets “killed” before the child process return?
- Generally, the child will get inherited by a grandparent.

// Didn’t expect it’s going to be such a creepy parenting blog …

4. Shell

A shell is a job control system which allows programmers to create and manage a set of programs to do some task —— Berkeley CS162

A shell is a command interpreter which makes key process-management system calls that are dealing with the creation and termination of processes. ——　Prof. Andy Tanenbaum

Okay~ how come we wind up with Shell anyway?

Every process has a parent and the parent also has its parent and so forth. We can go on and on and on until we hit something called init in UNIX;

The init is the first process which calls all other children processes and one of which is the shell.

The shell, whose job is to create and manipulate processes us, turns out to be a special process that fork()s itself and immediately calls exec() to load a new program into its memory address. (This often followed with wait() that blocks itself until gets the return value from the child, which releases the process from being a zombie.)

cc –c sourcefile1.c
cc –c sourcefile2.c
ln –o program sourcefile1.o sourcefile2.o
./program

The way that the Shell works is also how we start up a parallel program.

5. UNIX Process Management

UNIX fork – system call to create a copy of the current process, and start it running.
- No arguments!
UNIX exec – system call to change the program being run by the current process (replace the current running process with a brand new process).
UNIX wait – system call to wait for a process to finish.
UNIX signal – system call to send a notification to another process.

( To me, the signal service is just a sort of user-level interruption that works in any process regardless it’s a parent or child. )

An example of using the signal():

#include 
#include 
#include 
#include 
#include 

// this is just a way of changing the original `SIGINT` handler to our self-defined hander

void signal_callback_handler(int signum) {
  printf("Caught signal %d - phew!\n", signum);
  exit(1);
}

int main() {
  signal(SIGINT, signal_callback_handler);
  while (1) { }
}

Hongyu Hè

Queuing Theory: Understanding Waiting Lines

Basics of Queuing Theory

Kendall’s Notation

Formulas in Queuing Theory

Little’s Law

Erlang’s Formula

Kendall’s Formulas

Applications of Queuing Theory

Limitations of Queuing Theory

Outlook

Summary

Reference

A Glimpse of Survival Analysis

Definition of Survival Function

Hazard Function

Kaplan-Meier Estimator

Log-Rank Test

Cox Proportional Hazards Model

Summary

Are you Sure your Linux PID is the Process ID?

IDs for Processes and Threads in Linux

How to Distinguish between a Thread and a real “Process”?

Last Note: Multiprocessing vs. Multithreading

Example

Resource Accounting

Reference

Connecting Bayesian with Regularization

Bias-Variance Trade-off

Laplace is to Lasso as Gaussian is to Ridge

Summary

Good Reads

Bennett, Arnold. Self and self-management: Essays about existing. George H. Doran Company, 1918.

Hawking, Stephen. The theory of everything. Jaico Publishing House, 2006.

Hall, Herbert James. The untroubled mind. Houghton Mifflin, 1915.

Jones, Diana Wynne, John Sessions, and Stella Paskins. Howl’s moving castle. Recorded Books, 2008.

My Take on TED Talks

10 ways to have a better conversation

Self-control

To reach beyond your limits by training your mind

Body language, the power is in the palm of your hands

Imposters: The psychology of pretending to be someone you’re not

How to draw to remember more

SSH Proxy Jump

Step 1: key pair generation

Step 2: distribute key(s)

Step 3: indirect login

Autoencoders and VAEs

1 Autoencoders

1.1 Compression (by the “encoder”)

1.2 Reconstruction (by the “decoder”)

1.3 Backpropagation

2 Variational Autoencoders (VAEs)

2.1 Stochastical variation

2.2 Backpropagation? Reparametrization

3 Code example

Hey **system call**! Where on earth are you?

1. Where are they? — Layering

2. The charming “narrow waist” — The power of uniformed API

3. Final notes: kernel-buffered read && write

Recall the six key design concepts of UNIX I/O

fork( ) yourself!

1. Recall

2. fork()

3. The relationship between parent and child processes

4. Shell

5. UNIX Process Management

Hey system call! Where on earth are you?

3. Final notes: kernel-buffered `read` && `write`

2. `fork()`