<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://hongyuhe.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://hongyuhe.github.io/" rel="alternate" type="text/html" /><updated>2026-04-30T13:54:23+00:00</updated><id>https://hongyuhe.github.io/feed.xml</id><title type="html">Hongyu Hè</title><subtitle>Blog</subtitle><author><name>Hongyu Hè</name></author><entry><title type="html">PhD Year 1: Joy and Rejection</title><link href="https://hongyuhe.github.io/yr1phd/" rel="alternate" type="text/html" title="PhD Year 1: Joy and Rejection" /><published>2026-02-15T00:00:00+00:00</published><updated>2026-02-15T00:00:00+00:00</updated><id>https://hongyuhe.github.io/yr1phd</id><content type="html" xml:base="https://hongyuhe.github.io/yr1phd/"><![CDATA[<h2 id="foreword">Foreword</h2>

<p>Time flies, especially when you’re enjoying it. It’s already been two years since I <a href="/odyssey/">decided to join Princeton</a>. Now that I’m in the second half of my second year, I want to look back on how I’ve been “philosophizing” this Doctor of Philosophy journey.</p>

<p>My first year was full of pure joy and harsh rejections. Those two words feel diametrically incompatible at first glance, but my experience doesn’t make sense without both.</p>

<h2 id="what-i-wanted-from-year-one">What I wanted from year one</h2>

<p>I started my PhD with two goals: (1) publish a paper in my first year, and (2) fully experience American college life.</p>

<p>I set the first goal because I wanted to get my research going and integrate into the group as soon as possible. But it was much, much harder than I expected, which I’ll get into later. The second goal felt uniquely possible at Princeton, even as a PhD student.</p>

<p>In most European countries, doctoral students have formal contracts with the university. That means you’re technically an employee, and you’re treated like one. Most things feel transactional. Universities typically don’t actively organize social events or build community, so many PhD students mostly interact within their research groups.</p>

<p>Princeton feels very different. It builds a strong community around grad students. The graduate school is small, around 3,000 students in total (even smaller than Stanford’s med school). We’re all part of the <a href="https://en.wikipedia.org/wiki/Princeton_University_Graduate_College">Grad College</a>, which organizes a steady stream of social and academic events. Most campus events and clubs are also open to both undergrads and grads. For example, I’ve been hiking and skiing with the outdoor club, which is founded by the university but primarily organized by undergrads.</p>

<p>Most importantly, virtually everything is free for students. Princeton is very rich, and it treats its students very well. Events like this would be hard to imagine at most European universities (including top-ranked ones).</p>

<figure class="align-center">
  <div class="always-two-portrait-landscape">
    <img src="../_resources/image1.jpeg" />
    <img src="../_resources/image5.jpeg" />
  </div>
</figure>
<figure class="align-center">
  <div class="always-two">
    <img src="../_resources/image.jpeg" />
    <img src="../_resources/image3.jpeg" />
  </div>
  <figcaption>Hiking with <a href="https://outdooraction.princeton.edu/">Princeton OA</a>.</figcaption>
</figure>

<figure class="align-center">
  <div class="always-two">
    <img src="../_resources/image2.jpeg" />
    <img src="../_resources/image4.jpeg" />
  </div>
  <figcaption>Ski trip in NY and snowshoeing at <a href="https://en.wikipedia.org/wiki/Institute_for_Advanced_Study">IAS</a>.</figcaption>
</figure>

<h2 id="year-one-in-numbers">Year one in numbers</h2>

<ul>
  <li>Number of paper submissions: 8 (4 as first author)</li>
  <li>Number of paper rejections: 6</li>
  <li>Number of grant proposals I helped with: 2</li>
  <li>Number of undergrads mentored: 4</li>
  <li>Number of <a href="https://goodreads.com/hongyu">books read</a>: 9</li>
  <li>Number of courses taken: 6 (required)</li>
  <li>Number of <a href="https://www.princeton.edu/~gradcol/perm/hightable.htm">high tables</a> attended: 5</li>
  <li>Number of <a href="https://gchc.princeton.edu/formals/">grad formals</a>: 1</li>
  <li>Number of hikes: 17</li>
  <li>Number of skiing/skating trips: 2</li>
  <li>Number of concerts on campus: 2</li>
  <li>Number of free meals on campus: +∞</li>
</ul>

<figure class="align-center">
  <div class="always-two">
    <img src="../_resources/image6.jpeg" />
    <img src="../_resources/image7.jpeg" />
  </div>
  <figcaption><a href="https://www.princeton.edu/~gradcol/perm/hightable.htm">High table</a> with <a href="https://forms.music.princeton.edu/people/elizabeth-hellmuth-margulis/">Prof. Elizabeth Hellmuth Margulis</a>, where I asked her about music perception.</figcaption>
</figure>

<figure class="align-center">
  <div class="always-two">
    <img src="../_resources/image.png" />
    <img src="../_resources/IMG_6017.jpeg" />
  </div>
  <figcaption>Left: Visiting <a href="https://en.wikipedia.org/wiki/Penguin_Random_House">Penguin Random House (NYC HQ)</a> on a <a href="https://careerdevelopment.princeton.edu/">Princeton CCD</a> trip. <br />Right: A Broadway show outing organized by the Princeton grad school.</figcaption>
</figure>
<figure class="align-center">
  <div class="always-two-landscape-portrait">
    <img src="../_resources/IMG_9978.jpeg" />
    <img src="../_resources/image8.jpeg" />
  </div>
  <figcaption>Brat-themed <a href="https://gchc.princeton.edu/formals/">grad school formal</a> with my friends.</figcaption>
</figure>

<h2 id="a-typical-day">A typical day</h2>

<ul>
  <li><strong>8:00 AM</strong> — Play tennis with <a href="https://jinminhao.github.io/">Minhao</a> at the <a href="https://www.sasaki.com/projects/princeton-racquet-and-recreation-center/">Princeton Racquet Center</a></li>
  <li><strong>10:00 AM</strong> — Take the <a href="https://transportation.princeton.edu/getting-around/tigertransit">Tiger shuttle</a> to my office</li>
  <li><strong>Morning</strong> — Work</li>
  <li><strong>12:00 PM</strong> — Attend various seminars or talks on campus</li>
  <li><strong>1:00 PM</strong> — Get <a href="https://princetonfreefood.com/">free food</a> :)</li>
  <li><strong>Afternoon</strong> — Work</li>
  <li><strong>5:00 PM</strong> — Take the shuttle back to my apartment on the <a href="https://facilities.princeton.edu/projects/meadows-neighborhood">Meadows campus</a></li>
  <li><strong>6:00 PM</strong> — Hit the <a href="https://campusrec.princeton.edu/news/introducing-new-fitness-rec-spaces-meadows-campus">Meadows gym</a> or play squash</li>
  <li><strong>8:00 PM</strong> — Cook dinner or get <a href="https://princetonfreefood.com/">free food</a> again :)</li>
  <li><strong>Evening</strong> — Work</li>
</ul>

<figure class="align-center">
  <div class="always-two-landscape-portrait">
    <img src="../_resources/racquet_center.png" />
    <img src="../_resources/image9.jpeg" />
  </div>
  <figcaption>Princeton Racquet Center indoor and outdoor courts.</figcaption>
</figure>

<figure style="width: 60%" class="align-center">
  <img src="../_resources/image10.jpeg" alt="" />
  <figcaption>My little cubicle 🖼️ :)</figcaption>
</figure>

<h2 id="rejections">Rejections</h2>

<p>Publishing a paper in my first year turned out to be much harder than I expected, especially because I was working in a research area that was still relatively new to me. I used to think computer networking was basically similar “systems,” just with different domain applications. Oh boy, I was wrong. The way people in this field approach problems and assign value is different (e.g., empirical performance optimization vs. formal guarantees), and it took me quite a while to adapt.</p>

<p>To this day, I feel extremely fortunate to be my advisor’s student. She is one of the smartest and hardest-working people I’ve ever met. More importantly, she’s a kind mentor. She stays patient when I get stuck, and she’s willing to learn together with me, especially because what I’m working on spans several fairly different subjects.</p>

<p>Early on, I was also a difficult student to work with. I tend to have strong opinions about research direction and experimental methodology. At the time, I didn’t fully understand the difference between the community I came from and the computer networking community. I didn’t know how to present my ideas as a research paper, instead of a technical report packed with engineering details.</p>

<p>I still remember my first submission. Right before the deadline, my advisor had to rewrite most of what I had written, because my writing was so off track. She walked me through the process and even stayed up with me until 4am. I sincerely appreciate her patience and how she kept pushing me to think differently.
I also remember vividly the moment I got my first rejection. Two seconds later, I received a message from my her: “I’m sorry about the paper.” In that moment, something clicked. She wasn’t my boss, she was my mentor.</p>

<blockquote>
  <p>“Your ideas are like your children, and you don’t want them to go into the world in rags.”
— Patrick Winston</p>
</blockquote>

<p>Rejection is always hard, because our ideas are our children. But over time I realized the sting wasn’t just “they said no.” It was being forced to look at my own work with fewer illusions: What is the core claim, really? What is the clean story? What would a skeptical reviewer misunderstand, and why would that be my fault? Each rejection pushed me to tighten the logic, trim the noise, and crystallize the contribution in writing. In that sense, the reviews were less a verdict and more a mirror.</p>

<p>So when the paper was finally accepted, I felt genuinely happy, but not surprised. By then, the work had been stress-tested enough that acceptance felt like the outcome finally caught up with the version of the paper we had fought to make real.</p>

<p>I also learned not to over-interpret any single decision. There’s randomness in the review process, and with LLMs entering the workflow, that noise can feel even louder. A rejection doesn’t automatically mean the work is bad. It might mean the fit was off, the framing didn’t land, or the message got lost. The only part you can control is the slope: keep learning, keep revising, and keep making the work clearer and stronger.</p>

<h2 id="still-enjoying-my-solitude">Still enjoying my solitude</h2>

<p>My life has become much more colorful since I joined Princeton. I’ve made a lot of new friends and picked up new hobbies. But something has stayed the same. Unlike what many people worry about (loneliness), I’m terrified of someone getting into my life too much.</p>

<p>I’m always alone but never lonely, as I enjoy solitude more than anything. I love jogging alone around Lake Carnegie and along the Delaware Canal towpath (with many 🦊s and 🦌s). I love walking from my apartment to my office in the morning, and walking back alone at night. I also love standing alone in front of Lake Carnegie and letting the sunshine touch my face.</p>

<figure class="align-center">
  <div class="always-two-landscape-portrait">
    <img src="../_resources/image11.jpeg" />
    <img src="../_resources/ac8acf1948cb4a0dcbd949d7019c39c4.jpeg" />
  </div>
  <figcaption>Lake Carnegie day and night, where I often go 🏃‍♂️ and 🚶 alone.</figcaption>
</figure>

<p>Being alone gives me time and space to think and relax. It’s hard for many people around me to understand (my mom included), but to me, time alone is emblematic of unbridled freedom and control over my own life. In the end, life is ultimately a journey you take alone, whether you like it or not. If you haven’t found joy in solitude, I hope you find your own version of it, even in small doses.</p>

<p class="notice--info">PS: I also realized I’ve reached an age where the way I treat friends (especially female friends) can lead to misunderstandings. So I’ve decided to keep a necessary distance, to carve out space.</p>

<h2 id="finding-meaning-in-a-phd-and-in-life">Finding meaning in a PhD (and in life)</h2>

<p>To me, doing a PhD at a place like Princeton is an absolute privilege. It gives me time and space to think, and to reflect on myself: what <strong>I</strong> really want, what I’m good at, and what I’m not. For me, a PhD is not about innovation, “moving the needle,” or advancing humanity. It’s about understanding, understanding science, and maybe most importantly, understanding the science of life.</p>

<figure class="align-center">
  <div class="always-two">
    <img src="../_resources/image14.jpeg" />
    <img src="../_resources/image12.jpeg" />
  </div>
  <figcaption>Thanks to a fellowship support, I got to organize a <a href="https://hhy.ee.princeton.edu/bookclub">book club</a> in the CS department, where we read and discuss one book every month.</figcaption>
</figure>

<p>Since coming to Princeton, I’ve started reading physical books again. Thanks to a good friend (<a href="https://pattyliu.com">Patty</a> 🙏), I discovered <em><a href="https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach">Gödel, Escher, Bach: an Eternal Golden Braid</a></em>, and I got hooked. This book is about formal logic and its connections to math, art, and thinking. What caught my attention is that mathematical and logical symbols possess no inherent meaning. But once we give them meaning, they can express all of mathematics and beyond.</p>

<p>This got me thinking: why would life be any different? As humans, we have moved past the primal search for our next meal or the mere drive to find mates to reproduce. I believe life does not have an inherent, predetermined meaning unless we actively assign meaning to the world around us.</p>

<p>Life itself is difficult, it’s hard, it’s challenging. This treacherous journey comes with obstacles and all kinds of internal and external struggles that we have to endure. If we don’t give it meaning, then what is the point of being here and going through it all?</p>

<figure style="width: 60%" class="align-center">
  <img src="../_resources/b4ee2433ffebbc6fcfe8489f10a01d69.jpeg" alt="" />
  <figcaption>I picked up the violin again after a long hiatus. It’s become a simple way to reconnect with myself.</figcaption>
</figure>

<h2 id="conclusion-and-afterthoughts">Conclusion and afterthoughts</h2>

<p>My first year was full of pure joy and rejection. I loved it, and I enjoyed it wholeheartedly. I came to understand my research and myself much better. I’m grateful to Princeton for giving me a safe and enjoyable space to think with a peaceful mind. I’m grateful to my advisor and my friends for creating a supportive and intellectually stimulating environment.</p>

<figure style="width: 90%" class="align-center">
  <img src="../_resources/image13.jpeg" alt="" />
  <figcaption>My research group (<a href="https://netsyn.princeton.edu/">NetΣyn</a>) at Princeton. The third person from the right is my advisor, and the rest are my amazing labmates.
  </figcaption>
</figure>

<p>As AI has been upending the world as we know it, an ivory tower like Princeton has shielded us from much of the chaos. It gives us space and time to look inward, rather than constantly worrying about where the world might be heading. We will all have to face that chaos eventually, but for now, I cherish this space.</p>

<p>When I am old and can no longer be as active, I hope to have a rich memory book full of beautiful moments to flip through, rather than just a seemingly dense CV. Life is not a race to a finish line, nor is it merely about grinding through the daily motions. 
I’ve come to realize it is neither about the process nor the destination; it is about understanding who we are, and who we will eventually become along the way.</p>]]></content><author><name>Hongyu Hè</name></author><category term="journal" /><summary type="html"><![CDATA[A reflection on my 1st year of the PhD, where joy and rejection came together to reshape how I think about research, solitude, and life.]]></summary></entry><entry><title type="html">My Grad School Application Odyssey and Advice</title><link href="https://hongyuhe.github.io/odyssey/" rel="alternate" type="text/html" title="My Grad School Application Odyssey and Advice" /><published>2025-12-29T00:00:00+00:00</published><updated>2025-12-29T00:00:00+00:00</updated><id>https://hongyuhe.github.io/odyssey</id><content type="html" xml:base="https://hongyuhe.github.io/odyssey/"><![CDATA[<p>I have received a lot of questions about my grad school application experience. I also want to pay forward the help I received, so I decided to write this post.</p>

<h2 id="prelude">Prelude</h2>

<p>Two years ago around this time, I was working on a take-home task for a Harvard PhD interview, while compulsively refreshing my inbox and checking my spam folder for updates.</p>

<figure class="align-center">
  <div class="always-two-portrait-landscape">
    <img src="../_resources/IMG_7075.jpeg" />
    <img src="../assets/images/IMG_7077.jpg" />
  </div>
  <figcaption>Left: A few months before Gaokao. Right: Prep workbooks outside my classroom after Gaokao, 2016</figcaption>
</figure>

<p>For most of my life, I believed China’s National College Entrance Examination (NCEE), aka the Gaokao, was the most intense and stressful exam a student could face.<sup>[<a href="https://en.wikipedia.org/wiki/Gaokao" title="Gaokao (China’s National College Entrance Examination)">gaokao</a>]</sup> It is a decade-long, high-stakes funnel that shapes college options for millions of students, so when I decided to leave law school, my mom took it very hard. And on a personal level, that period was especially difficult for me due to a chronic, deeply damaging family situation.</p>

<p>Years later, applying to graduate school taught me that the stress can take a different form. Instead of a single, clearly defined test, it’s often the uncertainty that wears you down—slowly, persistently, and without a clear endpoint. You can do many things “right” and still feel like you’re guessing what matters.
That’s not to diminish the NCEE. It is difficult, no question about it, because millions of students compete for a limited number of seats. At the same time, however, it is <em>transparent</em>: you know what subjects you will be tested on, and once you clear the cutoff, the next step is usually straightforward.</p>

<p>Graduate admissions, at least in my experience, can feel like the opposite. It isn’t always obvious what is truly important, what is only sometimes important, or what you might be over-optimizing without realizing it. And if you’re applying from outside North America, that information gap can feel even wider.</p>

<p class="notice--danger">❗Disclaimer: My experience is specific to applicants to EE/CS PhD programs in engineering-oriented areas such as systems and networking. Experiences can vary substantially across fields, departments, labs, and advisors.</p>

<h2 id="my-results-and-timeline">My Results and Timeline</h2>

<p>The table below summarizes the programs I applied to in the 2023–2024 cycle and the outcomes. 
The list is not in any particular order.</p>

<p><strong>Two important notes</strong>:</p>
<ul>
  <li>Many schools (e.g., Stanford and Princeton) allow applicants to apply to only one program per cycle, while others (e.g., MIT and CMU) allow applications to multiple programs in the same cycle.</li>
  <li>Although advising is often flexible across departments at most schools, different departments can have distinct policies. For example, at Princeton, ECE students are admitted by the department. They complete a rotation period in their first year and are matched with advisors afterward. In contrast, COS students are admitted directly into advisors’ labs. Rotation can be a great way to explore fit, but it can also be a <em>stressful process</em> for many.</li>
</ul>

<table class="text-left">
  <thead>
    <tr>
      <th>Program</th>
      <th>Application Deadline</th>
      <th>First Interview Date</th>
      <th>Notification Date</th>
      <th>Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>U Cambridge, CST</td>
      <td>Dec 4</td>
      <td>Oct 27, 2023</td>
      <td>Feb 7, 2024</td>
      <td><span style="background-color:#d4edda;color:#155724;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Accepted</span></td>
    </tr>
    <tr>
      <td>Cornell, ECE</td>
      <td>Dec 15</td>
      <td>May 18, 2023</td>
      <td>Feb 16, 2024</td>
      <td><span style="background-color:#d4edda;color:#155724;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Accepted</span></td>
    </tr>
    <tr>
      <td>MIT, EECS</td>
      <td>Dec 15</td>
      <td>Feb 16, 2024</td>
      <td>Jan 28, 2024 (waitlist); Mar 16, 2024 (admit)</td>
      <td><span style="white-space:nowrap;">Waitlisted → <span style="background-color:#d4edda;color:#155724;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Accepted</span></span></td>
    </tr>
    <tr>
      <td>Stanford, CS</td>
      <td>Dec 5</td>
      <td>No interview</td>
      <td>Feb 9, 2024</td>
      <td><span style="background-color:#f8d7da;color:#721c24;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Rejected</span></td>
    </tr>
    <tr>
      <td>U  Toronto, CS</td>
      <td>Dec 1</td>
      <td>Jan 18, 2024</td>
      <td>Jan 31, 2024</td>
      <td><span style="background-color:#d4edda;color:#155724;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Accepted</span></td>
    </tr>
    <tr>
      <td>U  Toronto, ECE</td>
      <td>Dec 15</td>
      <td>Aug 2, 2023</td>
      <td>Feb 9, 2024</td>
      <td><span style="background-color:#d4edda;color:#155724;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Accepted</span></td>
    </tr>
    <tr>
      <td>Princeton, ECE</td>
      <td>Dec 15</td>
      <td>Jan 25, 2023</td>
      <td>Feb 16, 2024</td>
      <td><span style="background-color:#d4edda;color:#155724;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Accepted</span></td>
    </tr>
    <tr>
      <td>Columbia, CS</td>
      <td>Dec 15</td>
      <td>—</td>
      <td>—</td>
      <td><span style="background-color:#e2e3e5;color:#383d41;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Withdrawn</span></td>
    </tr>
    <tr>
      <td>Yale, CS</td>
      <td>Dec 15</td>
      <td>Aug 31, 2023</td>
      <td>Feb 12, 2024</td>
      <td><span style="background-color:#f8d7da;color:#721c24;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Rejected</span></td>
    </tr>
    <tr>
      <td>Penn, CIS</td>
      <td>Dec 1</td>
      <td>Jan 18, 2024</td>
      <td>Jan 26, 2024</td>
      <td><span style="background-color:#d4edda;color:#155724;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Accepted</span></td>
    </tr>
    <tr>
      <td>CMU, ECE</td>
      <td>Dec 1</td>
      <td>No interview</td>
      <td>Mar 7, 2024</td>
      <td><span style="background-color:#f8d7da;color:#721c24;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Rejected</span></td>
    </tr>
    <tr>
      <td>CMU, CS</td>
      <td>Dec 1</td>
      <td>—</td>
      <td>—</td>
      <td><span style="background-color:#e2e3e5;color:#383d41;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Withdrawn</span></td>
    </tr>
    <tr>
      <td>UC Berkeley, EECS</td>
      <td>Dec 11</td>
      <td>—</td>
      <td>—</td>
      <td><span style="background-color:#e2e3e5;color:#383d41;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Withdrawn</span></td>
    </tr>
    <tr>
      <td>UIUC, CS</td>
      <td>Dec 15</td>
      <td>Jan 13, 2024</td>
      <td>Feb 2, 2024</td>
      <td><span style="background-color:#d4edda;color:#155724;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Accepted</span></td>
    </tr>
    <tr>
      <td>Harvard, CS</td>
      <td>Dec 1</td>
      <td>Jan 17, 2024</td>
      <td>Jan 31, 2024</td>
      <td><span style="background-color:#d4edda;color:#155724;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Accepted</span></td>
    </tr>
    <tr>
      <td>U Washington, CSE</td>
      <td>Dec 15</td>
      <td>—</td>
      <td>—</td>
      <td><span style="background-color:#e2e3e5;color:#383d41;padding:0.05em 0.4em;border-radius:0.25em;font-weight:600;">Withdrawn</span></td>
    </tr>
  </tbody>
</table>

<h2 id="what-actually-matters-and-what-doesnt">What Actually Matters (and What Doesn’t)</h2>

<p>I applied from Europe, and the information gap was one of the hardest parts. I did not always know what to prioritize, what was “nice to have,” and what was mostly noise. Most questions I get are about this, so I will start here.</p>

<h3 id="detailed-recommendation-letters">Detailed recommendation letters</h3>

<p>Rec letters are the <em>most important</em> part of the application.</p>

<p>Most North American programs will ask for 3–5 letters, and the application portal will ask whether you want to waive your right to read them. My strong recommendation is to <em>waive access</em> (select “Yes”). In practice, committees often treat waived letters as more candid and therefore more credible.</p>

<p>What matters even more than “prestige” is <em>specificity</em>. A detailed, sincere letter from a professor who worked closely with you can carry more weight than a vague letter from a famous name. Strong letters do more than list achievements. They explain what you contributed, how you think, how you collaborate, and how you show up as a teammate and researcher.</p>

<p>At some schools (like UofT and MIT), there is also an internal pre-screening process (for example, by a committee consisting of selected students) before applications reach individual faculty. In those settings, detailed letters can be a major differentiator because they provide concrete evidence of research readiness, independence, and reliability.</p>

<h3 id="do-grades-really-matter">Do grades really matter?</h3>

<p>Grades matter, but in most PhD admissions processes, they are not the primary signal.</p>

<p>If you are early in undergrad and considering a PhD, my advice is to treat coursework as important for building fundamentals while recognizing that <em>research experience usually carries the most weight</em>. Many faculty will look at grades as a supporting signal, and only lean on them if they lack other ways to assess your readiness.</p>

<p>There are also practical issues if you are applying internationally. Different grading systems can be hard to interpret. During one interview at the University of Toronto, a professor told me candidly that he was not sure how to evaluate my grades because they were on the Swiss 1–6 scale.</p>

<p>Most importantly, coursework and research can feel like different worlds. Even very demanding classes do not always translate into the skills needed to thrive in open-ended research. For more on my perspective, see my mentorship statement.<sup>[<a href="https://hhy.ee.princeton.edu/archive/mentoring.pdf" title="Mentorship statement">mentorship</a>]</sup></p>

<h3 id="publications-and-research-experience">Publications and research experience</h3>

<p>The main reason to join a lab before applying is to build genuine research experience, and publications are one of the clearest ways to demonstrate that experience. A top-tier publication can completely change how your application is perceived.</p>

<p>At the same time, I do not believe publication count is a pure measure of talent. Timing, mentorship, project selection, and luck matter. Some students have early access to well-scoped problems and strong research pipelines; others are doing equally good work but have fewer structural advantages. Unfortunately, admissions often uses (first-author) publications as a proxy for research training, so they can matter a lot in practice.</p>

<p>This is also where I have mixed feelings. I personally believe a PhD should function more like an apprenticeship, and I will come back to this when I explain how I made my final decision.</p>

<p class="notice--info">ℹ️
As <a href="https://www.peterhenderson.co/">Prof. Peter Henderson</a> opined, the bar for incoming graduate students has been raised too high, and the diversity of the students that clear it has started to collapse …</p>

<h3 id="faculty-often-look-for-similar-traits">Faculty often look for similar traits</h3>

<p>One surprising thing I noticed during admitted-student visits (I will write about it later) is that many of the visiting students overlapped across schools, making us like a huge traveling group. Over time, it became clear that different programs often admit students with similar/same profiles and signals.</p>

<figure class="align-center">
  <div class="always-two">
    <img src="../assets/images/IMG_0633.jpg" />
    <img src="../_resources/IMG_1482.jpeg" />
  </div>
  <figcaption>Receptions at Harvard Faculty Club (left) and Princeton East Pyne Hall (right) both during admitted-student visits, March 2024.</figcaption>
</figure>

<p>Because of that, it can be helpful to study patterns. I recommend reading a range of statements of purpose and application materials.<sup>[<a href="https://www.notion.so/df39955313834889b7ac5411c37b958d?pvs=21" title="CS PhD Statements of Purpose collection (Notion)">sop-collection</a>] [<a href="https://mitcommlab.mit.edu/eecs/commkit/graduate-school-statement-of-purpose/" title="MIT CommLab: Graduate School Statement of Purpose">sop-mit</a>]</sup> Focus on:</p>
<ul>
  <li>what kinds of research experiences they had</li>
  <li>what kinds of projects they worked on (and with whom)</li>
  <li>whether they had publications (and at what venues)</li>
  <li>what skills and framing show up repeatedly</li>
</ul>

<p>For a structured discussion of how PIs and committees evaluate candidates, Chapter 2 of <em>The CS Assistant Professor Handbook</em><sup> [<a href="https://vijay03.github.io/asstprofbook/" title="The CS Assistant Professor Handbook">asstprofbook</a>]</sup> is a great resource.
Also, MIT has a helpful page summarizing what faculty look for in application essays.<sup>[<a href="https://www.eecs.mit.edu/academics/graduate-programs/admission-process/what-faculty-members-are-looking-for-in-a-grad-school-application-essay/" title="What faculty members are looking for in a grad school application essay (MIT EECS)">faculty-hints</a>]</sup></p>

<h2 id="before-the-actual-applications">Before the Actual Applications</h2>

<p>Beyond the information gap, a second challenge (especially if you are applying internationally) is that you may have fewer informal channels for advice, mentorship, and introductions. That makes your timeline and outreach strategy even more important.</p>

<h3 id="who-makes-the-admissions-decision">Who makes the admissions decision?</h3>

<p>You’ll often hear people describe two admissions models: “professor-centered” admissions and “committee-based” admissions. In the former, individual professors/principal investigators (PIs) have significant influence over admissions decisions; in the latter, a committee made up of multiple faculty members (and sometimes students) plays a larger role.</p>

<p>That said, regardless of the formal structure, faculty preferences almost always matter the most. The main reason is funding: unless a student brings external funding that fully covers their PhD (often referred to as a “free student”), their support typically comes from a faculty member’s research grants.</p>

<p>To avoid substantial over- or under-hiring, departments usually ask faculty to report in advance how many students they expect to be able to fund in the upcoming admissions cycle. As a result, if one or more faculty members are genuinely interested in working with you and have the resources to support you, your chances of admission are very strong.</p>

<h3 id="start-early-and-reach-out">Start early and reach out</h3>

<p>I started reaching out to professors and interviewing in May 2023, roughly seven months before most application deadlines. My first interview was on May 18 with Cornell, and it ended up being one of four interviews I had with Cornell faculty.</p>

<p>Here are three reasons to start early:</p>

<ol>
  <li>
    <p><strong>You learn about programs faster and assess fit sooner.</strong><br />
To find a good match for your interests and skills, you need to understand departments and people in detail. This was especially important for me because I was applying from Europe. Some schools combine departments (for example, EECS), while others separate them (CS/CIS vs. EE/ECE). If your background sits between hardware and software systems, multiple program labels can be relevant, but the actual culture and focus vary widely.</p>

    <p>Even within “ECE,” departments can look very different. For example, ECE at UIUC and Cornell has a strong computing focus, while Princeton’s ECE spans a wide range of areas including VLSI, networking, AI, materials, bio, photonics, and quantum. So if you are interested in optical computing, ECE at Princeton might be a good fit, while at another school you might need to look at materials or applied physics departments instead.</p>
  </li>
  <li>
    <p><strong>You learn who is realistically hiring.</strong><br />
Often, only a handful of faculty are a strong fit and have capacity in a given cycle. Several of my withdrawals happened because I learned faculty were not taking new students. For example, professors at UW and Columbia told me they were not able to take additional students due to recent over-hiring and the need to convert existing thesis students. That kind of information is hard to infer from their websites, and it matters even more during uncertain funding periods.</p>
  </li>
  <li>
    <p><strong>You build interview skills through repetition.</strong><br />
Many faculty interviews include a short research presentation and deep technical questions. Starting early gave me time to improve my slides and, more importantly, my ability to explain my work under pressure.</p>

    <p>I also learned some lessons the hard way. I had a Yale interview scheduled jointly with multiple professors, and I struggled to answer rapid-fire questions before the conversation moved on. My slides also had issues that were easy for someone experienced to spot. It was not a great outcome, but it was a valuable learning experience. After that, I scheduled interviews one-on-one when possible, and I improved quickly. As a result, I later received offers from every program that interviewed me (except Yale, which is fine :P). So, you might want to schedule some early interviews with less competitive programs to prime yourself for the harder ones later.</p>
  </li>
</ol>

<p class="notice--warning">:warning:
There could be a hidden cost to reaching out early. Those early interviews can feel like a higher bar than the standard in-cycle ones, closer to “prove it now” than “let’s explore fit.” As <a href="https://www.languagesforsyste.ms/">Prof. Mae Milano</a> noted: Cold outreach sometimes lands with faculty who are naturally skeptical and will quickly test whether there is a strong research match and clear readiness for the work. In those interviews, the questions can be more pointed and the expectations less scaffolded, which can feel sharper and less forgiving than a typical in-cycle interview.</p>

<h2 id="the-interviews">The Interviews</h2>

<p>The table below summarizes how many interviews I had and the types of interview tasks I encountered. I ranked programs based on the overall difficulty of the interviews and tasks (the higher the more difficult), but this is purely my personal experience (sample size <code class="language-plaintext highlighter-rouge">==</code> 1). Your experience may differ substantially.</p>

<div class="interview-table text-left">

  <table>
    <thead>
      <tr>
        <th>Ranking of Overall<br />Interview Difficulty</th>
        <th>University</th>
        <th>Number of Interviews</th>
        <th>Interview Tasks</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>1</td>
        <td>U Toronto</td>
        <td>5 (with 3 PIs)</td>
        <td>• Research presentation<br />• Coding question<br />• Take-home project<br />• Paper reviews (3 total)</td>
      </tr>
      <tr>
        <td>2</td>
        <td>U Cambridge</td>
        <td>2 (with 2 PIs)</td>
        <td>• Research presentation<br />• Research proposal<br /> (1,000 words)<br />• Proposal Q&amp;A</td>
      </tr>
      <tr>
        <td>3</td>
        <td>UIUC</td>
        <td>5 (2 with PIs; 3 with students)</td>
        <td>• Research presentation<br />• Paper discussions</td>
      </tr>
      <tr>
        <td>4</td>
        <td>Penn</td>
        <td>4 (with 3 PIs and 1 postdoc)</td>
        <td>Research presentation</td>
      </tr>
      <tr>
        <td>5</td>
        <td>Harvard</td>
        <td>1</td>
        <td>• Paper reviews (2 total)<br />• Research presentation</td>
      </tr>
      <tr>
        <td>6</td>
        <td>Cornell</td>
        <td>4 (with 3 PIs)</td>
        <td>Research presentation</td>
      </tr>
      <tr>
        <td>7</td>
        <td>Yale</td>
        <td>1 (joint with 2 PIs)</td>
        <td>Research presentation</td>
      </tr>
      <tr>
        <td>8</td>
        <td>Princeton</td>
        <td>1</td>
        <td>Conversation with PI</td>
      </tr>
      <tr>
        <td>9</td>
        <td>MIT</td>
        <td>1</td>
        <td>Research presentation</td>
      </tr>
    </tbody>
  </table>

</div>

<p>A common question is why I sometimes had as many as five interviews at the same university. The reason is that multiple professors can be interested in the same candidate, and they usually don’t coordinate closely. So you might be interviewed by several faculty members independently.</p>

<p>As you can see, the most common tasks are:</p>
<ul>
  <li>a short research presentation (often with slides)</li>
  <li>technical discussion</li>
  <li>and sometimes reading and reviewing papers</li>
</ul>

<p>So it helps a lot to practice explaining your work clearly, and to get comfortable discussing papers critically and constructively.</p>

<h3 id="research-presentation-and-technical-discussion">Research presentation and technical discussion</h3>

<p>Many interviews start with a short research presentation. A typical target is about 15 minutes for a structured overview of your work.</p>

<p>If you have papers, it helps to include a small number of key figures and focus on the core idea, the design choices, and the evaluation logic. You should also expect interruptions and questions, especially around trade-offs, alternatives, and “why did you do it this way?”</p>

<p>The most useful advice I can share is to proactively anticipate questions and practice your answers. Many questions repeat across interviews once you know what faculty tend to probe.</p>

<p>You may also interview with senior PhD students or postdocs. I often found those conversations especially engaging and sometimes more technically detailed. Students and postdocs may also be thinking about day-to-day collaboration, so they are assessing whether you would be a good teammate for the long run. If you know who you will meet, it is worth reading the group’s recent papers and skimming relevant code repos ahead of time.</p>

<h3 id="paper-review">Paper review</h3>

<p>Here are a few example reviews I wrote for the University of Toronto <sup>[<a href="../archive/reviews_uoft.pdf" title="UofT interview paper reviews (PDF)">reviews-uoft</a>]</sup> and Harvard interviews.<sup>[<a href="../archive/reviews_harvard.pdf" title="Harvard interview paper reviews (PDF)">reviews-harvard</a>]</sup></p>

<p>In these tasks, you are often evaluated on whether you can:</p>
<ul>
  <li>accurately identify the paper’s core contributions and assumptions</li>
  <li>offer a thoughtful critique (strengths and limitations)</li>
  <li>propose concrete extensions or alternative approaches</li>
</ul>

<p>Since the papers often come from the interviewing group, PIs may often be looking for whether you can develop a coherent “next step” that builds on their work.</p>

<p>A helpful starting guide is Prof. Onur Mutlu’s paper review guide: <a href="https://safari.ethz.ch/architecture/fall2022/lib/exe/fetch.php?media=onur-comparch-f22-how-to-do-the-paper-reviews.pdf" title="How to do paper reviews (Onur Mutlu)">mutlu-guide</a></p>

<h3 id="coding-and-other-tasks">Coding and other tasks</h3>

<p>Some programs have other types of tasks. For example, several systems faculty at the University of Toronto gave me coding tasks and take-home projects; the University of Cambridge asked for a 1K-word research proposal.</p>

<p>If you are curious, here is my codebase for a UofT coding interview task on implementing an SCMP ring buffer.<sup>[<a href="https://github.com/HongyuHe/ringbuffer" title="UofT take-home project (ring buffer repository)">ringbuffer</a>]</sup> The repo includes both my implementation and experiment results.</p>

<h2 id="school-visits">School Visits</h2>

<p>At the time, I lived in Zurich, Switzerland, and flights to the US were long and expensive. Fortunately, all US programs on my list (except UIUC) reimbursed travel and arranged hotels for me.</p>

<p>Visits are typically 2–3 days, and scheduling conflicts are common. For example, Cornell’s visit overlapped with Princeton and Penn’s, so I coordinated with Cornell to arrange an individual visit.</p>

<h3 id="caveat-weather-can-influence-your-experience">Caveat: weather can influence your experience</h3>

<p>This is not a scientific point, but it is real. For instance, when I visited Boston, the weather was a bit dismal (cold and rainy), and while the city is wonderful, it affected how much I enjoyed being out and exploring. Similarly, Ithaca in late March was snowing like no tomorrow …</p>
<figure class="align-center">
  <div class="always-two">
    <img src="https://hongyuhe.github.io/assets/images/IMG_0750.jpg" alt="Visiting Harvard on a rainy day in 2024." />
    <img src="https://hongyuhe.github.io/assets/images/IMG_7061.jpg" alt="Visiting Cornell in late March." />
  </div>
  <figcaption>Left: Visiting Harvard on a rainy day in 2024. Right: Visiting Cornell in late March afterward.</figcaption>
</figure>

<p>By contrast, Princeton had beautiful weather during my visit, and combined with the campus, it left a strong impression:</p>

<p><img src="../assets/images/IMG_1621.jpg" alt="IMG_1621.jpeg" /></p>

<h2 id="how-i-made-my-final-decision">How I Made My Final Decision</h2>

<p>People still ask me why I chose Princeton, especially since I invested a lot of time interviewing with other programs and faculty.</p>

<p>It wasn’t an easy decision. In fact, it was difficult enough that I ended up researching the science of making hard decisions and even made a video<sup>[<a href="https://youtu.be/MU6ODSHmipY?si=JFebIsDEgYT7R0qn" title="My talk on making hard decisions">decision-talk</a>]</sup> about it, which some of my friends found helpful:</p>
<div class="video-60">
  <iframe width="560" height="315" src="https://www.youtube.com/embed/MU6ODSHmipY?si=bdaQLPrMHnATyu07" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
</div>

<p><br />
Looking back, three factors mattered most.</p>

<ol>
  <li>
    <p><strong>I did not fully enjoy how “job-like” many interviews felt.</strong><br />
I understand why selection is necessary as competition is fierce, but parts of the process conflicted with a core belief of mine: <em><strong>a PhD should be an apprenticeship</strong></em>. If applicants are expected to already have every skill needed to succeed in the treacherous PhD journey, it raises the question of what kind of mentorship and training the program is offering.</p>

    <p>This contrast stood out because, at the time, I also had a job offer from a big tech. I was pursuing a PhD specifically because I wanted a different kind of learning and growth.</p>

    <p>Surprisingly, I initially thought my Princeton interview went poorly. It turn into a high-intensity research discussion: I never got to present the slides that I had prepared and practiced a million times by then, and instead I was met with lots of probing questions that required real-time reasoning. When it ended, I couldn’t gauge how I had done, in part because the PI (now my advisor) kept a neutral expression throughout. Later, I learned that this is simply how she engages when she’s thinking deeply.</p>

    <p>I believe she saw potential in me without putting me through a slew of tests and checks. I am deeply grateful for that trust.</p>
  </li>
  <li>
    <p><strong>The people and the environment mattered a lot.</strong><br />
During visits, I especially enjoyed meeting my current group members. They were consistently kind, supportive, and thoughtful, and I could imagine building a good working life with them. Culture is hard to measure from the outside, but it is one of the most important factors once you are actually doing the PhD.</p>

    <p>Additionally, after visiting many schools, I’ve come to believe that Princeton can satisfy all my imagination of a prototypical American campus, more so than anywhere else. After all, the word “campus” originated at Princeton.<sup>[<a href="https://www.princetonianamuseum.org/reference/f005e539-911c-4eee-8a59-71151b767c4c" title="Princetoniana Museum)">campus-word</a>]</sup></p>
  </li>
  <li>
    <p><strong>I wanted to lean into a new research direction.</strong><br />
Most of my offers were in software/hardware systems, except for my advisor’s area. I appreciated that she welcomed me even though I did not have deep prior background in the specific subfield I would work on in my PhD.</p>

    <p>I’ve always believed that early in one’s career, it’s important to seize opportunities to try something different before settling into a specific field. This perspective is also echoed in this TED talk: <a href="https://youtu.be/BQ2_BwqcFsc?si=Sz-HOtYuASxNaaOP" title="David Epstein's talk">david-talk</a></p>
  </li>
</ol>

<p class="notice--info">ℹ️ While financial considerations shouldn’t be the primary factor in choosing a PhD program, they matter a great deal in practice. Constantly worrying about making ends meet can seriously detract from your ability to focus on research. For this reason, I strongly recommend checking stipend levels alongside local cost of living before making a decision. For reference, here is a ranking of CS/EE PhD stipends in the US.<sup>[<a href="https://csstipendrankings.org/" title="CS/EE PhD stipend rankings (US)">phd-stipends</a>]</sup></p>

<h2 id="outro">Outro</h2>

<p>Time truly flies, and it has now been over a year since I started my PhD at Princeton.</p>

<p><strong>Did I make the right choice?</strong> I’d say there is no single “right” choice in the abstract; you <em>make it right</em> by what you build from it. What I can say is that, ever since starting, I’ve been enjoying every single day of my PhD journey. I’m grateful for my advisor, the engaging research, and the wonderful people around me.</p>

<p><strong>Is it still worth pursuing a CS/EE PhD in the age of AI?</strong> That derserves a separate blog post. But if you want to do a PhD for money or status, do not. Your PhD journey is for you and only yourself. And if you do choose it, doing a PhD in a place like Princeton is an absolute privilege, and in my experience, a real joy ~</p>

<figure style="width: 40%" class="align-center">
  <img src="https://hongyuhe.github.io/assets/images/IMG_4796.jpg" alt="" />
  <figcaption>🐯 in front of <a href="https://en.wikipedia.org/wiki/Alexander_Hall_(Princeton_University)">Alexander Hall</a>, Sep 2024.</figcaption>
</figure>

<h2 id="acknowledgments">Acknowledgments</h2>

<p>I want to thank <a href="https://www.languagesforsyste.ms/">Prof. Mae Milano</a>, <a href="https://seungjulee.com/">Seungju Lee</a>, <a href="https://scholar.google.com/citations?user=7rH1bGYAAAAJ&amp;hl=en">Constantine Doumanidis</a>, <a href="https://www.linkedin.com/in/mathewmadain">Mathew Madain</a>, Peilin Xin (CMU), <a href="https://khanhvu207.github.io/">Khánh Vũ</a>, and <a href="https://jameszfs.github.io/">Fengshi Zheng</a> for their helpful comments and feedback on an earlier draft of this post. I’m also grateful to <a href="https://cs.stanford.edu/~keithw/">Prof. Keith Winstein</a> for sending his advice to me as a reference.<sup>[<a href="https://cs.stanford.edu/~keithw/#writing" title="Keith Winstein: writing advice">winstein-advice</a>]</sup></p>

<h2 id="resources">Resources</h2>

<ul>
  <li>Advice posted by Prof. Keith Winstein under the “Writing” section.<sup>[<a href="https://cs.stanford.edu/~keithw/#writing" title="Keith Winstein: writing advice">winstein-advice</a>]</sup></li>
  <li><em>The CS Assistant Professor Handbook</em>.<sup>[<a href="https://vijay03.github.io/asstprofbook/" title="The CS Assistant Professor Handbook">asstprofbook</a>]</sup></li>
  <li>A collection of CS PhD statements of purpose.<sup>[<a href="https://www.notion.so/df39955313834889b7ac5411c37b958d?pvs=21" title="CS PhD Statements of Purpose collection (Notion)">sop-collection</a>]</sup></li>
  <li>MIT CommLab guide on writing statements of purpose.<sup>[<a href="https://mitcommlab.mit.edu/eecs/commkit/graduate-school-statement-of-purpose/" title="MIT CommLab: Graduate School Statement of Purpose">sop-mit</a>]</sup></li>
  <li>Paper review guide by Prof. Onur Mutlu.<sup>[<a href="https://safari.ethz.ch/architecture/fall2022/lib/exe/fetch.php?media=onur-comparch-f22-how-to-do-the-paper-reviews.pdf" title="How to do paper reviews (Onur Mutlu)">mutlu-guide</a>]</sup></li>
  <li>CS/EE PhD stipend rankings (US).<sup>[<a href="https://csstipendrankings.org/" title="CS/EE PhD stipend rankings (US)">phd-stipends</a>]</sup></li>
  <li>MIT EECS page on what faculty look for in application essays.<sup>[<a href="https://www.eecs.mit.edu/academics/graduate-programs/admission-process/what-faculty-members-are-looking-for-in-a-grad-school-application-essay/" title="What faculty members are looking for in a grad school application essay (MIT EECS)">faculty-hints</a>]</sup></li>
  <li>My talk on making hard decisions.<sup>[<a href="https://youtu.be/MU6ODSHmipY?si=JFebIsDEgYT7R0qn" title="My talk on making hard decisions">decision-talk</a>]</sup></li>
  <li>My UofT interview paper review examples.<sup>[<a href="../archive/reviews_uoft.pdf" title="UofT interview paper reviews (PDF)">reviews-uoft</a>]</sup></li>
  <li>My UofT take-home coding project.<sup>[<a href="https://github.com/HongyuHe/ringbuffer" title="UofT take-home project (ring buffer repository)">ringbuffer</a>]</sup></li>
  <li>My Harvard interview paper review examples.<sup>[<a href="../archive/reviews_harvard.pdf" title="Harvard interview paper reviews (PDF)">reviews-harvard</a>]</sup></li>
</ul>

<!-- ## References -->]]></content><author><name>Hongyu Hè</name></author><category term="journal" /><summary type="html"><![CDATA[A candid PhD application postmortem: interviews, visits, and decision-making]]></summary></entry><entry><title type="html">Queuing Theory: Understanding Waiting Lines</title><link href="https://hongyuhe.github.io/queuing/" rel="alternate" type="text/html" title="Queuing Theory: Understanding Waiting Lines" /><published>2023-02-05T00:00:00+00:00</published><updated>2023-02-05T00:00:00+00:00</updated><id>https://hongyuhe.github.io/queuing</id><content type="html" xml:base="https://hongyuhe.github.io/queuing/"><![CDATA[<p class="notice--info"><i class="far fa-sticky-note"></i> Some people prefer the term “queueing theory” over “queuing theory.” Here, I use the latter for simplicity.</p>

<h2 id="what-is-queuing-theory-and-why-should-you-care"><strong>What Is Queuing Theory, and Why Should You Care?</strong></h2>

<p>In computer systems, <em>latency is usually waiting</em>, not doing. Requests wait for CPU time, for a DB connection, for a disk, for a lock, for a thread pool, for a downstream dependency, etc. Queuing theory is the toolkit for reasoning about that waiting:</p>

<ul>
  <li><strong>When does latency “blow up” as load increases?</strong></li>
  <li><strong>Which component is the bottleneck?</strong></li>
  <li><strong>How many workers/threads/servers do we need to hit an SLO?</strong></li>
  <li><strong>Why do averages look fine while p99/p999 gets ugly?</strong></li>
</ul>

<p>A useful mental model from systems performance practice is the <strong>“utilization knee”</strong>: as utilization approaches 100%, queues grow superlinearly and tail latency spikes.</p>

<hr />

<h2 id="the-basics-how-queues-map-to-computer-systems"><strong>The Basics: How Queues Map to Computer Systems</strong></h2>

<p>A queuing system has three parts:</p>

<ol>
  <li><strong>Customers</strong>: requests, jobs, packets, RPCs, tasks.</li>
  <li><strong>Servers</strong>: CPU cores, threads, DB connections, GPU slots, NICs, disks.</li>
  <li><strong>Queue/buffer</strong>: ready queue, thread pool backlog, socket accept queue, message broker topic, request backlog at a load balancer.</li>
</ol>

<p>A minimal “single service station” view of many components:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
arrivals (λ)  ---&gt;  [ queue ]  ---&gt;  server(s) with rate μ  ---&gt;  departures

</code></pre></div></div>

<p>Key performance metrics you typically care about:</p>

<ul>
  <li><strong>Utilization</strong>: how busy the servers are.</li>
  <li><strong>Queue length</strong>: how much work is waiting.</li>
  <li><strong>Waiting time</strong>: how long work sits before service starts.</li>
  <li><strong>Response time (sojourn time)</strong>: waiting + service.</li>
  <li><strong>Throughput</strong>: completed work per unit time.</li>
</ul>

<hr />

<h2 id="kendalls-notation-a-handy-shorthand"><strong>Kendall’s Notation: A Handy Shorthand</strong></h2>

<p>Kendall’s notation is a concise way to name common models: <strong>A/S/C</strong>.</p>

<table>
  <thead>
    <tr>
      <th>Symbol</th>
      <th>Meaning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>$A$</td>
      <td>Inter-arrival distribution (e.g., $M$ = Poisson/exponential)</td>
    </tr>
    <tr>
      <td>$S$</td>
      <td>Service-time distribution (e.g., $M$ = exponential, $D$ = deterministic, $G$ = general)</td>
    </tr>
    <tr>
      <td>$C$</td>
      <td>Number of parallel servers</td>
    </tr>
  </tbody>
</table>

<p>Examples:</p>

<ul>
  <li><strong>M/M/1</strong>: Poisson arrivals, exponential service, 1 server.</li>
  <li><strong>M/M/c</strong>: same, but $c$ parallel servers (also called Erlang-C). <sup>[<a href="https://en.wikipedia.org/wiki/M/M/c_queue" title="M/M/c queue">mmc-queue</a>]</sup></li>
  <li><strong>G/G/1</strong>: “anything goes” arrivals and service times. <sup>[<a href="https://en.wikipedia.org/wiki/G/G/1_queue" title="G/G/1 queue">gg1-queue</a>]</sup></li>
</ul>

<p><strong>Queue discipline</strong> (FIFO, priority, shortest-job-first, processor-sharing) also matters for tail latency and fairness. <sup>[<a href="https://en.wikipedia.org/wiki/Queueing_theory" title="Queueing theory">queueing-theory</a>]</sup></p>

<hr />

<h2 id="essential-formulas-with-toy-computer-systems-examples"><strong>Essential Formulas (With Toy Computer-Systems Examples)</strong></h2>

<h3 id="1-littles-law-works-shockingly-often"><strong>1. Little’s Law (works shockingly often)</strong></h3>

\[L = \lambda W\]

<ul>
  <li>$L$: average number of requests <em>in the system</em> (waiting + in service)</li>
  <li>$\lambda$: effective arrival rate (throughput in steady state)</li>
  <li>$W$: average time in system (response time)</li>
</ul>

<p>Little’s Law is extremely general: it does <em>not</em> require Poisson arrivals or exponential service—just a stable, long-run average system. <sup>[<a href="https://en.wikipedia.org/wiki/Little%27s_law" title="Little's law">littles-law</a>]</sup></p>

<p><strong>Example: “How many requests are in-flight?”</strong></p>

<p>You run a service doing <strong>$1000$ requests/sec</strong> steady-state. Average end-to-end latency is <strong>$50$ ms</strong>.</p>

<ul>
  <li>$\lambda = 1000$ req/s</li>
  <li>$W = 0.050$ s</li>
  <li>$L = \lambda W = 1000 \times 0.050 = 50$</li>
</ul>

<p>So, on average, <strong>~50 requests are concurrently in-flight</strong> across your service (queued + running). That’s often directly actionable:</p>

<ul>
  <li>If your service uses <strong>one DB connection per in-flight request</strong>, you will need <strong>O(50)</strong> connections (plus headroom).</li>
  <li>If your service has a <strong>max in-flight limit of 30</strong>, you will either throttle to ~600 req/s or push queuing upstream.</li>
</ul>

<hr />

<h3 id="2-utilization-and-the-stability-condition"><strong>2. Utilization and the stability condition</strong></h3>

<p>For $c$ identical servers with service rate $\mu$ each:
\(\rho = \frac{\lambda}{c\mu}\)</p>

<p>Interpretation: $\rho$ is the fraction of time the server capacity is busy (on average).</p>

<p><strong>Critical insight:</strong> if $\rho \ge 1$, the queue is unstable (it grows without bound). In real systems, “without bound” manifests as timeouts, retries, OOMs, thread exhaustion, and cascading failure.</p>

<p><strong>Example: single-threaded handler</strong></p>

<p>A single-threaded worker processes requests with mean service time <strong>10 ms</strong>.</p>

<ul>
  <li>$\mu = 1/0.010 = 100$ req/s</li>
  <li>If arrivals are $\lambda = 80$ req/s, then $\rho = 0.8$ (stable)</li>
  <li>If arrivals rise to $\lambda = 110$ req/s, then $\rho = 1.1$ (unstable)</li>
</ul>

<p>This is why “CPU at 100%” is often a crisis: you have no slack to absorb randomness (bursts, long requests, GC pauses).</p>

<hr />

<h3 id="3-mm1-the-simplest-queue-with-a-sharp-lesson"><strong>3. M/M/1: the simplest queue with a sharp lesson</strong></h3>

<p>For <strong>M/M/1</strong>, mean response time is:
\(W = \frac{1}{\mu - \lambda}\)
and mean waiting time in queue:
\(W_q = W - \frac{1}{\mu}\)</p>

<p><strong>Example: an API server with one worker thread</strong></p>

<p>Same service time: 10 ms $\Rightarrow \mu = 100$ req/s.</p>

<p>If $\lambda = 80$ req/s:</p>

<ul>
  <li>$W = 1/(100-80) = 1/20 = 0.05$ s = <strong>50 ms</strong></li>
  <li>Service time is 10 ms, so</li>
  <li>$W_q = 50 - 10 = <strong>40 ms</strong>$ waiting</li>
  <li>By Little’s Law, $L = \lambda W = 80 \times 0.05 = <strong>4</strong>$ requests in system, on average</li>
</ul>

<p><strong>Operational takeaway:</strong> Even at <strong>80% utilization</strong>, average response time is <strong>5×</strong> the service time (because most time is waiting). As $\rho \rightarrow 1$, $W$ diverges rapidly—this is the “latency knee.”</p>

<hr />

<h3 id="4-mmc-erlang-c-adding-workers-reduces-waitingsometimes-dramatically"><strong>4. M/M/c (Erlang-C): adding workers reduces waiting—sometimes dramatically</strong></h3>

<p>For an M/M/c queue, the probability an arrival must wait (all servers busy) is given by the Erlang-C formula (often written in terms of offered load $a=\lambda/\mu$ and utilization $\rho=\lambda/(c\mu)$). <sup>[<a href="https://en.wikipedia.org/wiki/M/M/c_queue" title="M/M/c queue">mmc-queue</a>]</sup></p>

<p>Rather than stare at the algebra, use it to answer a capacity question.</p>

<p><strong>Example: sizing a thread pool / worker pool</strong></p>

<p>Suppose:</p>

<ul>
  <li>Arrival rate: $\lambda = 200$ req/s</li>
  <li>Each worker averages 12.5 ms per request $\Rightarrow \mu = 80$ req/s per worker</li>
</ul>

<p><strong>Case A: 3 workers</strong></p>

<ul>
  <li>$\rho = 200/(3 \times 80) = 0.833$</li>
</ul>

<p>Erlang-C results (for these numbers):</p>

<ul>
  <li>$P(\text{wait}) \approx 0.70$</li>
  <li>Mean queueing delay $W_q \approx 17.6$ ms</li>
  <li>Mean response time $W \approx 30.1$ ms</li>
</ul>

<p><strong>Case B: 4 workers</strong></p>

<ul>
  <li>$\rho = 200/(4 \times 80) = 0.625$</li>
</ul>

<p>Now:</p>

<ul>
  <li>$P(\text{wait}) \approx 0.32$</li>
  <li>$W_q \approx 2.7$ ms</li>
  <li>$W \approx 15.2$ ms</li>
</ul>

<p><strong>Interpretation:</strong> Adding <strong>one</strong> worker (3 → 4) did not increase capacity by 33% <em>just</em> to get 33% more throughput. It <strong>cut mean latency roughly in half</strong> by pulling utilization away from the knee and collapsing the queue.</p>

<p>This is a recurring theme in real services: once you are near saturation, <strong>small capacity changes can create huge latency changes</strong>.</p>

<hr />

<h3 id="5-variability-matters-a-lot-the-vut--kingman-approximation"><strong>5. Variability matters (a lot): the VUT / Kingman approximation</strong></h3>

<p>Real workloads are rarely “memoryless exponential everything.” Variability in arrivals (bursts) and service (slow queries, GC, cache misses) is what inflates tail latency.</p>

<p>A common, practical approximation for <strong>G/G/1</strong> is Kingman’s formula (often called the <strong>VUT equation</strong>): <sup>[<a href="https://en.wikipedia.org/wiki/Kingman%27s_formula" title="Kingman's formula">kingman-formula</a>]</sup>
\(\mathbb{E}[W_q] \approx \left(\frac{\rho}{1-\rho}\right)\left(\frac{c_a^2 + c_s^2}{2}\right)\tau\)</p>

<ul>
  <li>$\rho$: utilization</li>
  <li>$c_a$: coefficient of variation of inter-arrival times</li>
  <li>$c_s$: coefficient of variation of service times</li>
  <li>$\tau$: mean service time</li>
</ul>

<p><strong>Example: same mean service time, different variability</strong></p>

<p>Let:</p>

<ul>
  <li>$\rho = 0.8$</li>
  <li>$\tau = 10$ ms</li>
  <li>arrivals roughly Poisson $\Rightarrow c_a \approx 1$</li>
</ul>

<p><strong>Scenario 1: highly variable service</strong> ($c_s = 1$)</p>

<ul>
  <li>$\mathbb{E}[W_q] \approx (0.8/0.2) \times ((1^2+1^2)/2) \times 10 \text{ ms}$</li>
  <li>$= 4 \times 1 \times 10 = 40$ ms</li>
</ul>

<p><strong>Scenario 2: more predictable service</strong> ($c_s = 0.2$)</p>

<ul>
  <li>$\mathbb{E}[W_q] \approx 4 \times ((1 + 0.04)/2) \times 10$</li>
  <li>$= 4 \times 0.52 \times 10 = 20.8$ ms</li>
</ul>

<p>Same mean utilization and mean service time, but <strong>halving variability roughly halves queueing delay</strong>.</p>

<p><strong>Operational takeaway:</strong> performance work that reduces variance (caching, eliminating stop-the-world pauses, bounding query time, isolating noisy neighbors) often improves tail latency more than shaving a millisecond off the mean. <sup>[<a href="https://en.wikipedia.org/wiki/Kingman%27s_formula" title="Kingman's formula">kingman-formula</a>]</sup></p>

<hr />

<h2 id="queues-in-real-computer-systems-patterns-youll-actually-see"><strong>Queues in Real Computer Systems: Patterns You’ll Actually See</strong></h2>

<h3 id="1-a-queue-at-every-boundary"><strong>1) A queue at every boundary</strong></h3>

<p>Common “queue boundaries” include:</p>

<ul>
  <li><strong>Load balancer backlog</strong> (requests waiting to be accepted)</li>
  <li><strong>Thread pool queue</strong> (waiting for a worker)</li>
  <li><strong>DB connection pool</strong> (waiting for a connection)</li>
  <li><strong>Downstream service</strong> (RPCs queued behind other callers)</li>
  <li><strong>Disk / network</strong> (I/O queues)</li>
</ul>

<p>A key practice: apply Little’s Law <em>per boundary</em> to estimate concurrency needs and identify bottlenecks. <sup>[<a href="https://en.wikipedia.org/wiki/Little%27s_law" title="Little's law">littles-law</a>]</sup></p>

<h3 id="2-bottlenecks-dominate-end-to-end-latency"><strong>2) Bottlenecks dominate end-to-end latency</strong></h3>

<p>If a request must traverse multiple queues (a small queueing network), the slowest / most utilized stage tends to dominate waiting. Queueing networks are a standard lens for multi-tier services. <sup>[<a href="https://en.wikipedia.org/wiki/Queueing_theory" title="Queueing theory">queueing-theory</a>]</sup></p>

<h3 id="3-queue-discipline-changes-p99-behavior"><strong>3) Queue discipline changes p99 behavior</strong></h3>

<p>FIFO is fair-ish, but it lets one slow job delay many fast ones. Alternative disciplines:</p>

<ul>
  <li><strong>Priority</strong>: protect latency-sensitive traffic (but risk starving background jobs). <sup>[<a href="https://en.wikipedia.org/wiki/Queueing_theory" title="Queueing theory">queueing-theory</a>]</sup></li>
  <li><strong>Shortest-job-first / SRPT</strong>: improves mean latency; may be unfair without safeguards. <sup>[<a href="https://en.wikipedia.org/wiki/Queueing_theory" title="Queueing theory">queueing-theory</a>]</sup></li>
  <li><strong>Processor sharing</strong>: approximates time-slicing, common in some network and CPU models. <sup>[<a href="https://en.wikipedia.org/wiki/Queueing_theory" title="Queueing theory">queueing-theory</a>]</sup></li>
</ul>

<p>This is why “just add a queue” is not neutral; it encodes policy.</p>

<hr />

<h2 id="practical-modeling-workflow-minimal-but-useful"><strong>Practical Modeling Workflow (Minimal but Useful)</strong></h2>

<ol>
  <li><strong>Pick the boundary you’re modeling</strong> (e.g., DB connection pool).</li>
  <li>
    <p><strong>Measure</strong>:</p>

    <ul>
      <li>$\lambda$: arrival/throughput into that boundary</li>
      <li>$\tau$: mean service time</li>
      <li>variability proxies: p50/p95/p99 service times; burstiness of arrivals</li>
    </ul>
  </li>
  <li><strong>Compute utilization</strong> $\rho$ and check headroom.</li>
  <li>
    <p><strong>Use a simple model first</strong>:</p>

    <ul>
      <li>M/M/1 or M/M/c for “first-order” capacity decisions. <sup>[<a href="https://en.wikipedia.org/wiki/M/M/c_queue" title="M/M/c queue">mmc-queue</a>]</sup></li>
      <li>Kingman (G/G/1) when variability is clearly the story. <sup>[<a href="https://en.wikipedia.org/wiki/Kingman%27s_formula" title="Kingman's formula">kingman-formula</a>]</sup></li>
    </ul>
  </li>
  <li><strong>Validate against production</strong> (does predicted $L$ match observed in-flight? does predicted knee match when p99 explodes?).</li>
</ol>

<p>The goal is not perfect fidelity; it is a <strong>defensible mental model</strong> that prevents naive mistakes (like running everything at 95–99% utilization and being surprised by timeouts).</p>

<hr />

<h2 id="limitations-what-the-formulas-dont-tell-you"><strong>Limitations (What the Formulas Don’t Tell You)</strong></h2>

<ul>
  <li><strong>Non-stationarity</strong>: diurnal traffic, deployments, incident conditions.</li>
  <li><strong>Correlated arrivals</strong>: synchronized retries, fan-out bursts, cache stampedes.</li>
  <li><strong>Heavy-tailed service times</strong>: long-tail queries or rare slow paths can dominate p99/p999.</li>
  <li><strong>Finite buffers and backpressure</strong>: real queues drop, throttle, or shed load.</li>
  <li><strong>Retries change $\lambda$</strong>: a timeout often creates <em>more</em> work, not less.</li>
</ul>

<p>Queueing theory still helps here, but you often combine it with measurement, tracing, and load testing.</p>

<hr />

<h2 id="wrapping-it-all-up"><strong>Wrapping It All Up</strong></h2>

<p>If you remember only a few points:</p>

<ul>
  <li><strong>Little’s Law</strong> turns latency and throughput into concurrency: $L=\lambda W$. <sup>[<a href="https://en.wikipedia.org/wiki/Little%27s_law" title="Little's law">littles-law</a>]</sup></li>
  <li><strong>Utilization is destiny</strong>: near 100%, queues (and tail latency) explode.</li>
  <li><strong>Variability inflates waiting</strong>: controlling variance can be more valuable than improving the mean. <sup>[<a href="https://en.wikipedia.org/wiki/Kingman%27s_formula" title="Kingman's formula">kingman-formula</a>]</sup></li>
  <li><strong>Adding capacity near the knee can cut latency disproportionately</strong> (Erlang-C intuition). <sup>[<a href="https://en.wikipedia.org/wiki/M/M/c_queue" title="M/M/c queue">mmc-queue</a>]</sup></li>
</ul>

<hr />

<h2 id="references"><strong>References</strong></h2>

<ul>
  <li>Little’s Law (background). <sup>[<a href="https://en.wikipedia.org/wiki/Little%27s_law" title="Little's law">littles-law</a>]</sup></li>
  <li>System performance and queueing intuition (talk). <sup>[<a href="https://www.youtube.com/watch?v=3at6ijBU2ug" title="SREcon24 Americas - System Performance and Queuing ...">srecon24</a>]</sup></li>
  <li>M/M/c (Erlang-C) model (background). <sup>[<a href="https://en.wikipedia.org/wiki/M/M/c_queue" title="M/M/c queue">mmc-queue</a>]</sup></li>
  <li>Queueing theory in practice for performance modeling (talk). <sup>[<a href="https://www.youtube.com/watch?v=Hda5tMrLJqc" title="LISA17 - Queueing Theory in Practice: Performance Modeling ...">lisa17</a>]</sup></li>
  <li>Kingman’s approximation / VUT equation (background). <sup>[<a href="https://en.wikipedia.org/wiki/Kingman%27s_formula" title="Kingman's formula">kingman-formula</a>]</sup></li>
  <li>Harchol-Balter, M. (2013). <em>Performance Modeling and Design of Computer Systems: Queueing Theory in Action</em>. Cambridge University Press.</li>
  <li>Gross, D., &amp; Harris, C. M. (1998). <em>Fundamentals of Queuing Theory</em>. John Wiley &amp; Sons, Inc.</li>
  <li>Kleinrock, L. (1975). <em>Queueing Systems, Volume 1: Theory</em>. John Wiley &amp; Sons, Inc.</li>
  <li>Sutton, C., &amp; Jordan, M. I. (2010). Bayesian inference for queueing networks and modeling of internet services. <sup>[<a href="https://arxiv.org/abs/1001.3355" title="Bayesian inference for queueing networks and modeling of internet services">sutton-jordan2010</a>]</sup></li>
</ul>]]></content><author><name>Hongyu Hè</name></author><category term="systems" /><summary type="html"><![CDATA[Queueing theory for latency: Little’s Law, utilization knee, Erlang-C, and variability]]></summary></entry><entry><title type="html">Are you Sure your Linux PID is the Process ID?</title><link href="https://hongyuhe.github.io/pid/" rel="alternate" type="text/html" title="Are you Sure your Linux PID is the Process ID?" /><published>2023-01-01T00:00:00+00:00</published><updated>2023-01-01T00:00:00+00:00</updated><id>https://hongyuhe.github.io/pid</id><content type="html" xml:base="https://hongyuhe.github.io/pid/"><![CDATA[<h2 id="ids-for-processes-and-threads-in-linux">IDs for Processes and Threads in Linux</h2>

<p>In Linux, both processes and threads are assigned numeric identifiers, and you can see them show up as peer directories under the <code class="language-plaintext highlighter-rouge">/proc</code> pseudo-filesystem. Each schedulable entity appears as a subdirectory in the form <code class="language-plaintext highlighter-rouge">/proc/[pid]</code>, where that number is often referred to as a “PID”.</p>

<p>Here is the catch: the value that shows up in <code class="language-plaintext highlighter-rouge">/proc/[pid]</code> is not strictly a <em>process</em> identifier. Depending on context, it may refer to a thread or a process, because Linux historically built threads on top of processes.</p>

<p>You can observe this terminology overload in tools like <code class="language-plaintext highlighter-rouge">htop</code>. By default, <code class="language-plaintext highlighter-rouge">htop</code> lists both processes and threads without clearly separating them, so its “PID” column contains identifiers that can correspond to either.</p>

<p>This has a historical reason. Early Linux did not have a first-class notion of threads, only processes. Over time, Linux introduced “thread groups” (Linux 2.4, around 2001), which support the POSIX threads model: multiple threads that conceptually belong to one process. Internally, the shared “process ID” that user space expects is implemented as a <strong>thread group identifier (TGID)</strong>. As described in the <code class="language-plaintext highlighter-rouge">clone(2)</code> manual, <code class="language-plaintext highlighter-rouge">getpid(2)</code> returns the TGID of the caller, not a per-thread identifier.</p>

<p>As a result, threads in the same process share the same TGID, while each thread also has its own unique <strong>thread ID (TID)</strong>. Practically, <code class="language-plaintext highlighter-rouge">getpid()</code> returns the same value across all threads in the process, while <code class="language-plaintext highlighter-rouge">gettid()</code> returns a unique value per thread.</p>

<h2 id="how-to-distinguish-between-a-thread-and-a-real-process">How to Distinguish Between a Thread and a “Real Process”</h2>

<p>Conceptually, Linux processes and threads are similar because the kernel schedules both as runnable entities. The major difference is what they <em>share</em>: threads in the same process typically share an address space and other resources, while separate processes usually do not (unless explicitly arranged).</p>

<p>To distinguish “threads within a process” from a standalone process, the most reliable place to look is:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">/proc/[pid]/task/[tid]</code></li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">task/</code> directory enumerates kernel-visible threads. The <code class="language-plaintext highlighter-rouge">tid</code> component is the kernel thread ID. This is distinct from user-level threading abstractions (for example, some managed runtimes can implement user-level “threads” that are not one-to-one kernel threads). Those user-level threads are not directly visible as separate kernel thread IDs.</p>

<p>Within a multithreaded process, all threads belong to the same thread group. The main thread has <code class="language-plaintext highlighter-rouge">tid == tgid</code>, and the other threads have distinct <code class="language-plaintext highlighter-rouge">tid</code>s but the same <code class="language-plaintext highlighter-rouge">tgid</code>. You will also notice that <code class="language-plaintext highlighter-rouge">/proc/[pid]/task/[tid]</code> mirrors <code class="language-plaintext highlighter-rouge">/proc/[pid]/</code> when <code class="language-plaintext highlighter-rouge">pid == tid</code>, because that path is effectively describing the same main thread and the same thread group leader.</p>

<p>So, when you inspect <code class="language-plaintext highlighter-rouge">/proc/[pid]</code> for a multithreaded process, you can interpret it as follows:</p>

<ul>
  <li>The directory name <code class="language-plaintext highlighter-rouge">/proc/[pid]</code> corresponds to the process’s TGID (and the main thread’s TID).</li>
  <li>The <code class="language-plaintext highlighter-rouge">/proc/[pid]/task/</code> subdirectories enumerate all kernel threads in that thread group.</li>
  <li>The “process” you intuitively think of is the thread group leader (the main thread) that created the other threads.</li>
</ul>

<h2 id="last-note-multiprocessing-vs-multithreading">Last Note: Multiprocessing vs Multithreading</h2>

<p>To avoid confusion with thread groups (TGID), it helps to also remember <strong>process groups</strong>, which use <strong>PGID</strong>. TGID is about threads within a process, while PGID is about groups of processes (used heavily for job control in shells).</p>

<p>In broad strokes, a new process is created via <code class="language-plaintext highlighter-rouge">fork()</code> (and friends), while a new thread is created via <code class="language-plaintext highlighter-rouge">pthread_create()</code> in C. Under the hood, Linux commonly uses the <code class="language-plaintext highlighter-rouge">clone()</code> syscall for both, with different flags controlling what is shared.</p>

<p>When a process spawns subprocesses, the spawning process is the <strong>parent</strong>, and it may either create a new process group or inherit an existing one. If it creates a new process group, the PGID is typically set to the PID of the process that created the group. If it inherits a group, the PGID will match the inherited group.</p>

<p>In contrast, the main thread and the threads it creates are best thought of as <strong>siblings</strong> under the same thread group: they share a TGID, and they also share the same PGID as the main thread. One practical implication is that threads are not visible as child “processes” to the parent of the thread group leader.</p>

<h2 id="example">Example</h2>

<p>A convenient way to generate a multithreaded workload on Linux is to use <code class="language-plaintext highlighter-rouge">stress-ng</code>. In <code class="language-plaintext highlighter-rouge">stress-ng</code> terminology, a “stressor” (or “hog”) is a process.</p>

<p>For example, the following command runs a memory contention stressor:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span>stress-ng <span class="nt">--mcontend</span> 1 <span class="nt">-t</span> 10h
stress-ng: info:  <span class="o">[</span>56472] dispatching hogs: 1 mcontend
</code></pre></div></div>

<p>If you open <code class="language-plaintext highlighter-rouge">htop</code>, you can view the resulting process and its threads as a hierarchy. In that display, <code class="language-plaintext highlighter-rouge">PGRP</code> corresponds to the process group ID (PGID), while the <code class="language-plaintext highlighter-rouge">PID</code> column is overloaded and may represent either process IDs or thread IDs depending on the row.</p>

<p><img src="https://hongyuhe.github.io/_resources/pid.png" alt="htop results" /></p>

<p>In the example shown:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">56472</code> is the single-threaded parent process created by your shell when you ran the command.</li>
  <li><code class="language-plaintext highlighter-rouge">56473</code> is the multithreaded child process (and also the main thread of that child).</li>
  <li><code class="language-plaintext highlighter-rouge">56474</code> through <code class="language-plaintext highlighter-rouge">56477</code> are sibling threads created by the main thread <code class="language-plaintext highlighter-rouge">56473</code>.</li>
</ul>

<p>If you run <code class="language-plaintext highlighter-rouge">pidof</code> on the stressor, you will typically get the TGID (which matches the main thread’s ID):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span>pidof stress-ng-mcontend
56473
</code></pre></div></div>

<p>You can confirm that <code class="language-plaintext highlighter-rouge">56472</code> is the parent of <code class="language-plaintext highlighter-rouge">56473</code> by inspecting the parent’s <code class="language-plaintext highlighter-rouge">children</code> file:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span><span class="nb">cat</span> /proc/56472/task/56472/children
56473
</code></pre></div></div>

<p>Next, if you list the child’s <code class="language-plaintext highlighter-rouge">task/</code> directory, you will see all kernel threads in that thread group:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span>ll /proc/56473/task/
total 0
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 ./
dr-xr-xr-x 9 hy hy 0 Dec 31 22:21 ../
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56473/
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56474/
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56475/
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56476/
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56477/
</code></pre></div></div>

<p>If you inspect the <code class="language-plaintext highlighter-rouge">task/</code> directory for one of the sibling threads, you will still see the full set of thread IDs in the group (because you are still within the same thread group context):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span>ll /proc/56476/task/
total 0
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 ./
dr-xr-xr-x 9 hy hy 0 Dec 31 22:21 ../
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56473/
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56474/
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56475/
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56476/
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56477/
</code></pre></div></div>

<p>Finally, notice that the single-threaded parent process has only itself under <code class="language-plaintext highlighter-rouge">task/</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span>ll /proc/56472/task/
total 0
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 ./
dr-xr-xr-x 9 hy hy 0 Dec 31 22:21 ../
dr-xr-xr-x 7 hy hy 0 Dec 31 22:21 56472/
</code></pre></div></div>

<h3 id="resource-accounting">Resource Accounting</h3>

<p>Once you start looking at threads, resource accounting becomes another place where Linux tooling can be surprising. Different tools choose to aggregate or split resource usage in different ways, and sometimes they report thread-level information only when explicitly asked.</p>

<p>For example, <code class="language-plaintext highlighter-rouge">htop</code> typically aggregates the CPU and memory usage of all threads into the main thread’s row by default. Similarly, <code class="language-plaintext highlighter-rouge">ps</code> will usually show aggregated usage for the thread group leader when you query it by PID:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span>ps <span class="nt">-p</span> 56473 <span class="nt">-o</span> %cpu,%mem,cmd
%CPU %MEM CMD
 473  0.0 stress-ng-mcontend
</code></pre></div></div>

<p>If you try the same query on a sibling thread ID, you may get nothing because <code class="language-plaintext highlighter-rouge">ps</code> is often oriented around process identifiers unless you request thread detail:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span>ps <span class="nt">-p</span> 56476 <span class="nt">-o</span> %cpu,%mem,cmd
%CPU %MEM CMD
</code></pre></div></div>

<p>To display threads, you can use <code class="language-plaintext highlighter-rouge">ps -L</code> with the main thread’s ID:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span>ps <span class="nt">-L</span> 56473 <span class="nt">-o</span> %cpu,%mem,cmd
%CPU %MEM CMD
97.3  0.0 stress-ng-mcontend
94.0  0.0 stress-ng-mcontend
94.0  0.0 stress-ng-mcontend
94.0  0.0 stress-ng-mcontend
94.0  0.0 stress-ng-mcontend
</code></pre></div></div>

<p>If you want a more detailed listing, <code class="language-plaintext highlighter-rouge">ps -L ... -F</code> includes fields such as <code class="language-plaintext highlighter-rouge">LWP</code> (the thread ID) and <code class="language-plaintext highlighter-rouge">NLWP</code> (the number of threads):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span>ps <span class="nt">-L</span> 56473 <span class="nt">-F</span>
UID          PID    PPID     LWP  C NLWP    SZ   RSS PSR STIME TTY      STAT   TIME CMD
hy         56473   56472   56473 97    5 22792  2604  13 08:10 pts/2    RLl+ 302:30 stress-ng-mcontend
hy         56473   56472   56474 94    5 22792  2604   7 08:10 pts/2    RLl+ 292:03 stress-ng-mcontend
hy         56473   56472   56475 94    5 22792  2604  31 08:10 pts/2    RLl+ 292:00 stress-ng-mcontend
hy         56473   56472   56476 94    5 22792  2604  15 08:10 pts/2    RLl+ 291:59 stress-ng-mcontend
hy         56473   56472   56477 94    5 22792  2604   0 08:10 pts/2    RLl+ 292:05 stress-ng-mcontend
</code></pre></div></div>

<p>Thread reporting in <code class="language-plaintext highlighter-rouge">top</code> has similar behavior. With <code class="language-plaintext highlighter-rouge">-H</code>, <code class="language-plaintext highlighter-rouge">top</code> shows per-thread CPU usage, while without it, <code class="language-plaintext highlighter-rouge">top</code> aggregates:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:/proc<span class="nv">$ </span>top <span class="nt">-H</span> <span class="nt">-p</span> 56476
....
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  56473 hy        20   0   91168   2708   2272 R  97.3   0.0 127:24.91 stress-ng-mcont
  56474 hy        20   0   91168   2708   2272 R  94.0   0.0 122:55.16 stress-ng-mcont
  56475 hy        20   0   91168   2708   2272 R  94.0   0.0 122:54.44 stress-ng-mcont
  56476 hy        20   0   91168   2708   2272 R  93.7   0.0 122:55.33 stress-ng-mcont
  56477 hy        20   0   91168   2708   2272 R  92.3   0.0 122:56.57 stress-ng-mcont
</code></pre></div></div>

<p>Without <code class="language-plaintext highlighter-rouge">-H</code>, <code class="language-plaintext highlighter-rouge">top</code> reports a single aggregated number and attributes it to the thread group leader:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hy@node-0:~<span class="nv">$ </span>top <span class="nt">-p</span> 56476
....
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  56473 hy        20   0   91168   2708   2272 R 476.3   0.0 621:39.36 stress-ng-mcont
</code></pre></div></div>

<p>If you are scripting, you can use batch mode to extract either aggregate or per-thread CPU values, depending on whether you include <code class="language-plaintext highlighter-rouge">-H</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Aggregated</span>
hy@node-0:~<span class="nv">$ </span>top <span class="nt">-b</span> <span class="nt">-n</span> 2 <span class="nt">-d</span> 0.2 <span class="nt">-p</span> 56476 | <span class="nb">tail</span> <span class="nt">-1</span> | <span class="nb">awk</span> <span class="s1">'{print $9}'</span>
465.0

<span class="c"># Per-thread with -H</span>
hy@node-0:~<span class="nv">$ </span>top <span class="nt">-b</span> <span class="nt">-H</span> <span class="nt">-n</span> 2 <span class="nt">-d</span> 0.2 <span class="nt">-p</span> 56476 | <span class="nb">tail</span> <span class="nt">-1</span> | <span class="nb">awk</span> <span class="s1">'{print $9}'</span>
75.0
</code></pre></div></div>

<p>If you need ground-truth per-thread accounting, <code class="language-plaintext highlighter-rouge">/proc</code> is again the most explicit source. A thread’s <code class="language-plaintext highlighter-rouge">/proc/[pid]/stat</code> can reflect aggregated values when accessed via certain paths, whereas <code class="language-plaintext highlighter-rouge">/proc/[pid]/task/[tid]/stat</code> is the per-thread view:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Total CPU time (user and kernel) aggregated across the thread group.</span>
hy@node-0:~<span class="nv">$ </span><span class="nb">cat</span> /proc/56476/stat | <span class="nb">awk</span> <span class="s1">'{print $14, $15}'</span>
9460932 12361

<span class="c"># CPU time for only thread 56476.</span>
hy@node-0:~<span class="nv">$ </span><span class="nb">cat</span> /proc/56476/task/56476/stat | <span class="nb">awk</span> <span class="s1">'{print $14, $15}'</span>
1879429 3032
</code></pre></div></div>

<p>You can also use <code class="language-plaintext highlighter-rouge">psutil</code> for a convenient scripting interface, but it is important to know what it aggregates. In many cases, <code class="language-plaintext highlighter-rouge">psutil</code> reports CPU and memory usage in a way that effectively attributes thread-group totals to whichever thread you query, because it is fundamentally process-centric:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">psutil</span>

<span class="c1"># Great-grandparent process (e.g., tmux session).
</span><span class="o">&gt;&gt;&gt;</span> <span class="n">tmux_session</span> <span class="o">=</span> <span class="n">psutil</span><span class="p">.</span><span class="n">Process</span><span class="p">(</span><span class="mi">54711</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">tmux_session</span><span class="p">.</span><span class="n">ppid</span><span class="p">()</span>
<span class="mi">1</span>
<span class="o">&gt;&gt;&gt;</span> <span class="p">[(</span><span class="n">child</span><span class="p">.</span><span class="n">name</span><span class="p">(),</span> <span class="n">child</span><span class="p">.</span><span class="n">pid</span><span class="p">)</span> <span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">tmux_session</span><span class="p">.</span><span class="n">children</span><span class="p">(</span><span class="n">recursive</span><span class="o">=</span><span class="bp">True</span><span class="p">)]</span>
<span class="p">[(</span><span class="s">'bash'</span><span class="p">,</span> <span class="mi">54712</span><span class="p">),</span> <span class="p">(</span><span class="s">'bash'</span><span class="p">,</span> <span class="mi">56236</span><span class="p">),</span> <span class="p">(</span><span class="s">'python'</span><span class="p">,</span> <span class="mi">56613</span><span class="p">),</span> <span class="p">(</span><span class="s">'stress-ng'</span><span class="p">,</span> <span class="mi">56472</span><span class="p">),</span> <span class="p">(</span><span class="s">'stress-ng-mcontend'</span><span class="p">,</span> <span class="mi">56473</span><span class="p">)]</span>

<span class="c1"># Parent process.
</span><span class="o">&gt;&gt;&gt;</span> <span class="n">parent</span> <span class="o">=</span> <span class="n">psutil</span><span class="p">.</span><span class="n">Process</span><span class="p">(</span><span class="mi">56472</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">parent</span><span class="p">.</span><span class="n">ppid</span><span class="p">()</span>
<span class="mi">54712</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">parent</span><span class="p">.</span><span class="n">children</span><span class="p">(</span><span class="n">recursive</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="p">[</span><span class="n">psutil</span><span class="p">.</span><span class="n">Process</span><span class="p">(</span><span class="n">pid</span><span class="o">=</span><span class="mi">56473</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'stress-ng-mcontend'</span><span class="p">,</span> <span class="n">status</span><span class="o">=</span><span class="s">'running'</span><span class="p">,</span> <span class="n">started</span><span class="o">=</span><span class="s">'11:21:57'</span><span class="p">)]</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">parent</span><span class="p">.</span><span class="n">num_threads</span><span class="p">()</span>
<span class="mi">1</span>

<span class="c1"># Child process (main thread / thread group leader).
</span><span class="o">&gt;&gt;&gt;</span> <span class="n">child</span> <span class="o">=</span> <span class="n">psutil</span><span class="p">.</span><span class="n">Process</span><span class="p">(</span><span class="mi">56473</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">child</span><span class="p">.</span><span class="n">num_threads</span><span class="p">()</span>
<span class="mi">5</span>
<span class="o">&gt;&gt;&gt;</span> <span class="p">[</span><span class="n">thread</span><span class="p">.</span><span class="nb">id</span> <span class="k">for</span> <span class="n">thread</span> <span class="ow">in</span> <span class="n">child</span><span class="p">.</span><span class="n">threads</span><span class="p">()]</span>
<span class="p">[</span><span class="mi">56473</span><span class="p">,</span> <span class="mi">56474</span><span class="p">,</span> <span class="mi">56475</span><span class="p">,</span> <span class="mi">56476</span><span class="p">,</span> <span class="mi">56477</span><span class="p">]</span>

<span class="c1"># Sibling thread example.
</span><span class="o">&gt;&gt;&gt;</span> <span class="n">sibling</span> <span class="o">=</span> <span class="n">psutil</span><span class="p">.</span><span class="n">Process</span><span class="p">(</span><span class="mi">56476</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">child</span><span class="p">.</span><span class="n">ppid</span><span class="p">()</span>
<span class="mi">56472</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">sibling</span><span class="p">.</span><span class="n">ppid</span><span class="p">()</span>
<span class="mi">56472</span>

<span class="c1"># Accounting.
</span><span class="o">&gt;&gt;&gt;</span> <span class="n">parent</span><span class="p">.</span><span class="n">cpu_percent</span><span class="p">(</span><span class="n">interval</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="mf">0.0</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">sibling</span><span class="p">.</span><span class="n">cpu_percent</span><span class="p">(</span><span class="n">interval</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="mf">471.5</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">child</span><span class="p">.</span><span class="n">cpu_percent</span><span class="p">(</span><span class="n">interval</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="mf">472.4</span>

<span class="o">&gt;&gt;&gt;</span> <span class="n">tmux_session</span><span class="p">.</span><span class="n">cpu_times</span><span class="p">()</span>
<span class="n">pcputimes</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="mf">7.46</span><span class="p">,</span> <span class="n">system</span><span class="o">=</span><span class="mf">3.19</span><span class="p">,</span> <span class="n">children_user</span><span class="o">=</span><span class="mf">102.18</span><span class="p">,</span> <span class="n">children_system</span><span class="o">=</span><span class="mf">153.15</span><span class="p">,</span> <span class="n">iowait</span><span class="o">=</span><span class="mf">0.0</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">parent</span><span class="p">.</span><span class="n">cpu_times</span><span class="p">()</span>
<span class="n">pcputimes</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">system</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">children_user</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">children_system</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">iowait</span><span class="o">=</span><span class="mf">0.0</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">child</span><span class="p">.</span><span class="n">cpu_times</span><span class="p">()</span>
<span class="n">pcputimes</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="mf">45250.11</span><span class="p">,</span> <span class="n">system</span><span class="o">=</span><span class="mf">57.79</span><span class="p">,</span> <span class="n">children_user</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">children_system</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">iowait</span><span class="o">=</span><span class="mf">0.0</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">sibling</span><span class="p">.</span><span class="n">cpu_times</span><span class="p">()</span>
<span class="n">pcputimes</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="mf">45255.42</span><span class="p">,</span> <span class="n">system</span><span class="o">=</span><span class="mf">57.79</span><span class="p">,</span> <span class="n">children_user</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">children_system</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">iowait</span><span class="o">=</span><span class="mf">0.0</span><span class="p">)</span>

<span class="o">&gt;&gt;&gt;</span> <span class="n">parent</span><span class="p">.</span><span class="n">memory_full_info</span><span class="p">()</span>
<span class="n">pfullmem</span><span class="p">(</span><span class="n">rss</span><span class="o">=</span><span class="mi">6475776</span><span class="p">,</span> <span class="n">vms</span><span class="o">=</span><span class="mi">59777024</span><span class="p">,</span> <span class="n">shared</span><span class="o">=</span><span class="mi">6078464</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="mi">1728512</span><span class="p">,</span> <span class="n">lib</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="mi">32018432</span><span class="p">,</span> <span class="n">dirty</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">uss</span><span class="o">=</span><span class="mi">3051520</span><span class="p">,</span> <span class="n">pss</span><span class="o">=</span><span class="mi">3749888</span><span class="p">,</span> <span class="n">swap</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">child</span><span class="p">.</span><span class="n">memory_full_info</span><span class="p">()</span>
<span class="n">pfullmem</span><span class="p">(</span><span class="n">rss</span><span class="o">=</span><span class="mi">2772992</span><span class="p">,</span> <span class="n">vms</span><span class="o">=</span><span class="mi">93356032</span><span class="p">,</span> <span class="n">shared</span><span class="o">=</span><span class="mi">2326528</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="mi">1728512</span><span class="p">,</span> <span class="n">lib</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="mi">65581056</span><span class="p">,</span> <span class="n">dirty</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">uss</span><span class="o">=</span><span class="mi">126976</span><span class="p">,</span> <span class="n">pss</span><span class="o">=</span><span class="mi">735232</span><span class="p">,</span> <span class="n">swap</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">sibling</span><span class="p">.</span><span class="n">memory_full_info</span><span class="p">()</span>
<span class="n">pfullmem</span><span class="p">(</span><span class="n">rss</span><span class="o">=</span><span class="mi">2772992</span><span class="p">,</span> <span class="n">vms</span><span class="o">=</span><span class="mi">93356032</span><span class="p">,</span> <span class="n">shared</span><span class="o">=</span><span class="mi">2326528</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="mi">1728512</span><span class="p">,</span> <span class="n">lib</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="mi">65581056</span><span class="p">,</span> <span class="n">dirty</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">uss</span><span class="o">=</span><span class="mi">126976</span><span class="p">,</span> <span class="n">pss</span><span class="o">=</span><span class="mi">735232</span><span class="p">,</span> <span class="n">swap</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>

<span class="o">&gt;&gt;&gt;</span> <span class="n">tmux_session</span><span class="p">.</span><span class="n">memory_percent</span><span class="p">()</span>
<span class="mf">0.007239506814671662</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">parent</span><span class="p">.</span><span class="n">memory_percent</span><span class="p">()</span>
<span class="mf">0.009602063988251591</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">child</span><span class="p">.</span><span class="n">memory_percent</span><span class="p">()</span>
<span class="mf">0.004111699759675096</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">sibling</span><span class="p">.</span><span class="n">memory_percent</span><span class="p">()</span>
<span class="mf">0.004111699759675096</span>
</code></pre></div></div>

<p>The upshot is that “what is the PID” and “what resources belong to a thread” depend strongly on which abstraction and which tool you are using. Many tools default to aggregating at the thread-group level, even when they print thread IDs, and you typically need explicit flags (or direct <code class="language-plaintext highlighter-rouge">/proc</code> inspection) to get consistent per-thread views.</p>

<p>To end the running example, sending an interrupt to one thread can terminate the entire stressor, depending on how the program handles signals:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Note: this returns True even if the process is a zombie.
</span><span class="o">&gt;&gt;&gt;</span> <span class="n">parent</span><span class="p">.</span><span class="n">is_running</span><span class="p">()</span> <span class="o">==</span> <span class="n">child</span><span class="p">.</span><span class="n">is_running</span><span class="p">()</span> <span class="o">==</span> <span class="n">sibling</span><span class="p">.</span><span class="n">is_running</span><span class="p">()</span> <span class="o">==</span> <span class="bp">True</span>
<span class="bp">True</span>

<span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">signal</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">sibling</span><span class="p">.</span><span class="n">send_signal</span><span class="p">(</span><span class="n">signal</span><span class="p">.</span><span class="n">SIGINT</span><span class="p">)</span>

<span class="o">&gt;&gt;&gt;</span> <span class="n">parent</span><span class="p">.</span><span class="n">is_running</span><span class="p">()</span> <span class="o">==</span> <span class="n">child</span><span class="p">.</span><span class="n">is_running</span><span class="p">()</span> <span class="o">==</span> <span class="n">sibling</span><span class="p">.</span><span class="n">is_running</span><span class="p">()</span> <span class="o">==</span> <span class="bp">False</span>
<span class="bp">True</span>
</code></pre></div></div>
<p>(It seems that interrupting one thread has a bottom-up cascading effect in <code class="language-plaintext highlighter-rouge">stress-ng</code> 🥴 )</p>

<p class="text-left">Happy New Year 🎆 ~</p>

<hr />

<h1 id="reference">Reference</h1>

<ul>
  <li><a href="https://unix.stackexchange.com/questions/364660/are-threads-implemented-as-processes-on-linux">https://unix.stackexchange.com/questions/364660/are-threads-implemented-as-processes-on-linux</a></li>
  <li><a href="https://unix.stackexchange.com/questions/670836/why-do-threads-have-their-own-pid">https://unix.stackexchange.com/questions/670836/why-do-threads-have-their-own-pid</a></li>
  <li><a href="https://stackoverflow.com/questions/1221555/retrieve-cpu-usage-and-memory-usage-of-a-single-process-on-linux">https://stackoverflow.com/questions/1221555/retrieve-cpu-usage-and-memory-usage-of-a-single-process-on-linux</a></li>
  <li><a href="https://unix.stackexchange.com/questions/404054/how-is-a-process-group-id-set">https://unix.stackexchange.com/questions/404054/how-is-a-process-group-id-set</a></li>
  <li><a href="https://stackoverflow.com/questions/4856255/the-difference-between-fork-vfork-exec-and-clone">https://stackoverflow.com/questions/4856255/the-difference-between-fork-vfork-exec-and-clone</a></li>
  <li><a href="https://stackoverflow.com/questions/19678954/relation-between-thread-id-and-process-id">https://stackoverflow.com/questions/19678954/relation-between-thread-id-and-process-id</a></li>
  <li><a href="https://stackoverflow.com/questions/9430491/find-cpu-usage-for-a-thread-in-linux">https://stackoverflow.com/questions/9430491/find-cpu-usage-for-a-thread-in-linux</a></li>
  <li><a href="https://stackoverflow.com/questions/1420426/how-to-calculate-the-cpu-usage-of-a-process-by-pid-in-linux-from-c">https://stackoverflow.com/questions/1420426/how-to-calculate-the-cpu-usage-of-a-process-by-pid-in-linux-from-c</a></li>
  <li><a href="https://stackoverflow.com/questions/19919881/sysconf-sc-clk-tck-what-does-it-return">https://stackoverflow.com/questions/19919881/sysconf-sc-clk-tck-what-does-it-return</a></li>
  <li><a href="https://www.baeldung.com/linux/total-process-cpu-usage">https://www.baeldung.com/linux/total-process-cpu-usage</a></li>
  <li><a href="https://psutil.readthedocs.io/en/latest/#processes">https://psutil.readthedocs.io/en/latest/#processes</a></li>
  <li><a href="https://man7.org/linux/man-pages/man5/proc.5.html#top_of_page">https://man7.org/linux/man-pages/man5/proc.5.html</a></li>
  <li><a href="https://man7.org/linux/man-pages/man2/getpid.2.html">https://man7.org/linux/man-pages/man2/getpid.2.html</a></li>
  <li><a href="https://man7.org/linux/man-pages/man2/gettid.2.html">https://man7.org/linux/man-pages/man2/gettid.2.html</a></li>
  <li><a href="https://manpages.ubuntu.com/manpages/bionic/man1/stress-ng.1.html">https://manpages.ubuntu.com/manpages/bionic/man1/stress-ng.1.html</a></li>
  <li><a href="https://www.akkadia.org/drepper/nptl-design.pdf">https://www.akkadia.org/drepper/nptl-design.pdf</a></li>
</ul>]]></content><author><name>Hongyu Hè</name></author><category term="systems" /><category term="notes" /><summary type="html"><![CDATA[Linux PID vs TID vs TGID: how threads show up in /proc and tools like htop]]></summary></entry><entry><title type="html">A Glimpse of Survival Analysis</title><link href="https://hongyuhe.github.io/survival/" rel="alternate" type="text/html" title="A Glimpse of Survival Analysis" /><published>2022-07-22T00:00:00+00:00</published><updated>2022-07-22T00:00:00+00:00</updated><id>https://hongyuhe.github.io/survival</id><content type="html" xml:base="https://hongyuhe.github.io/survival/"><![CDATA[<p>Survival analysis is the toolkit for <strong>time-to-event</strong> questions—where the “event” could be death, but in computer systems it’s more often:</p>

<ul>
  <li>time until a <strong>server fails</strong></li>
  <li>time until a <strong>customer churns</strong></li>
  <li>time until an <strong>incident is resolved</strong></li>
  <li>time until a <strong>job completes</strong></li>
  <li>time until a <strong>cache entry expires</strong> (or until a key becomes “cold”)</li>
</ul>

<p>What makes survival analysis special is that it handles <strong>incomplete observation windows</strong> correctly. In real engineering datasets, you frequently do not observe the event for everyone before you stop collecting data:</p>

<ul>
  <li>A customer hasn’t churned <em>yet</em> when you export the dataset.</li>
  <li>A disk hasn’t failed <em>yet</em> when you stop the experiment.</li>
  <li>A request hasn’t completed <em>yet</em> when you cut off a trace.</li>
</ul>

<p>Treating those as “missing” or dropping them biases you toward shorter times and overconfident conclusions. Survival analysis is designed to keep that partial “last-seen” information and remain statistically consistent.</p>

<h2 id="survival-data-in-systems-what-you-actually-record">Survival data in systems: what you actually record</h2>

<p>A time-to-event dataset typically has at least two columns:</p>

<ul>
  <li><strong>duration</strong>: how long you observed the unit (user/machine/request)</li>
  <li><strong>event</strong>: whether the event occurred within your observation window (1) or not (0)</li>
</ul>

<p>A concrete toy example: <strong>time-to-churn (days)</strong> for 6 trial users.</p>

<table>
  <thead>
    <tr>
      <th>user</th>
      <th style="text-align: right">duration (days)</th>
      <th style="text-align: center">event?</th>
      <th>meaning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>A</td>
      <td style="text-align: right">4</td>
      <td style="text-align: center">1</td>
      <td>churned on day 4</td>
    </tr>
    <tr>
      <td>B</td>
      <td style="text-align: right">7</td>
      <td style="text-align: center">0</td>
      <td>still active at day 7 (not observed to churn yet)</td>
    </tr>
    <tr>
      <td>C</td>
      <td style="text-align: right">2</td>
      <td style="text-align: center">1</td>
      <td>churned on day 2</td>
    </tr>
    <tr>
      <td>D</td>
      <td style="text-align: right">10</td>
      <td style="text-align: center">0</td>
      <td>still active at day 10 (not observed to churn yet)</td>
    </tr>
    <tr>
      <td>E</td>
      <td style="text-align: right">6</td>
      <td style="text-align: center">1</td>
      <td>churned on day 6</td>
    </tr>
    <tr>
      <td>F</td>
      <td style="text-align: right">3</td>
      <td style="text-align: center">1</td>
      <td>churned on day 3</td>
    </tr>
  </tbody>
</table>

<p>Key idea (from the reference video): the “event=0” rows are not “unknown.” They tell you the unit lasted <strong>at least</strong> that long. <sup>[<a href="https://www.youtube.com/watch?v=7_XK7mGMm1E" title="Statistical Learning: 11.1 Introduction to Survival Data and Censoring">survival-video</a>]</sup></p>

<p>You also need to be explicit about your <strong>time origin</strong>:</p>

<ul>
  <li>For churn: trial start, first purchase, first session, last renewal?</li>
  <li>For failures: installation time, last maintenance, last reboot?</li>
  <li>For incidents: page time, incident creation time, first alert time?</li>
</ul>

<p>Changing the origin changes the interpretation of the curves and coefficients.</p>

<h2 id="two-core-functions-survival-and-hazard-with-intuition">Two core functions: survival and hazard (with intuition)</h2>

<p><strong>Survival function</strong> answers: “What fraction lasts beyond time <em>t</em>?”</p>

\[S(t) = P(T &gt; t)\]

<p>For churn, you can read $S(30)=0.8$ as “80% of users are still active after 30 days.” <sup>[<a href="https://en.wikipedia.org/wiki/Survival_analysis" title="Survival analysis">survival-analysis</a>]</sup></p>

<p><strong>Hazard function</strong> answers a different question: “Given you’ve made it to time <em>t</em>, how ‘risky’ is <em>right now</em>?”</p>

<p>Formally:</p>

\[h(t) = \lim_{\Delta t \rightarrow 0}\frac{P(t \le T &lt; t+\Delta t \mid T \ge t)}{\Delta t}\]

<p>In systems terms, hazard is often closer to what you want operationally:</p>

<ul>
  <li>“Given a server has been up 20 days, what is its <strong>instantaneous failure rate</strong> now?”</li>
  <li>“Given a user is still active at week 4, what is the <strong>instantaneous churn pressure</strong> now?”</li>
</ul>

<p>A practical relationship to remember:</p>

<ul>
  <li>high hazard around time <em>t</em> means the survival curve drops steeply around time <em>t</em></li>
  <li>if hazard increases with time, you’re in “wear-out” behavior (common in hardware); if it decreases, you may be seeing “early-life failures” (infant mortality) or onboarding drop-off</li>
</ul>

<p>A common derived quantity is the <strong>cumulative hazard</strong>:</p>

\[H(t) = \int_0^t h(u)\,du \quad \text{and} \quad S(t) = e^{-H(t)}\]

<p>This becomes useful when you want to add hazards over stages or interpret models on a log scale. <sup>[<a href="https://en.wikipedia.org/wiki/Survival_analysis" title="Survival analysis">survival-analysis</a>]</sup></p>

<h2 id="the-kaplanmeier-curve-a-survival-curve-from-observed-events-plus-last-seen-records">The Kaplan–Meier curve: a survival curve from observed events plus “last-seen” records</h2>

<p>If you do not want to assume a distribution, the <strong>Kaplan–Meier (KM) estimator</strong> gives a non-parametric estimate of $S(t)$ using (a) observed event times and (b) “last-seen active” times for units that did not experience the event within the window. <sup>[<a href="https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator" title="Kaplan–Meier estimator">kaplan-meier</a>]</sup></p>

<p>It is:</p>

\[\hat{S}(t) = \prod_{i:t_i \le t}\left(1-\frac{d_i}{n_i}\right)\]

<p>Where at each event time $t_i$:</p>

<ul>
  <li>$n_i$ = number “at risk” just before $t_i$ (still being observed, event not yet happened)</li>
  <li>$d_i$ = number of events at $t_i$</li>
</ul>

<p><strong>Toy KM computation (churn example).</strong><br />
Using the 6-user table above, sort by time and update the risk set:</p>

<p>Event times are 2, 3, 4, 6 (and we have “last-seen” times at 7 and 10).</p>

<ul>
  <li>At day 2: $n=6$, $d=1$ → multiply by $(1-1/6)=5/6$</li>
  <li>At day 3: $n=5$, $d=1$ → multiply by $(1-1/5)=4/5$</li>
  <li>At day 4: $n=4$, $d=1$ → multiply by $(1-1/4)=3/4$</li>
  <li>At day 6: $n=3$, $d=1$ → multiply by $(1-1/3)=2/3$</li>
</ul>

<p>So:</p>

<ul>
  <li>$\hat S(2)=5/6 \approx 0.833$</li>
  <li>$\hat S(3)=(5/6)(4/5)=4/6 \approx 0.667$</li>
  <li>$\hat S(4)=(5/6)(4/5)(3/4)=3/6 = 0.5$</li>
  <li>$\hat S(6)=(5/6)(4/5)(3/4)(2/3)=2/6 \approx 0.333$</li>
</ul>

<p><strong>Where “last-seen” matters:</strong> users B and D remain “at risk” up to days 7 and 10, respectively, contributing correctly to $n_i$ up to those times. KM is effectively saying: “they didn’t churn before 7/10, and that information counts.” <sup>[<a href="https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator" title="Kaplan–Meier estimator">kaplan-meier</a>]</sup></p>

<p>A highly practical workflow for product/systems:</p>

<ul>
  <li>Plot KM curves for cohorts (e.g., different regions, hardware batches, onboarding variants).</li>
  <li>Look for <em>when</em> curves separate: early vs late differences are operationally distinct (onboarding vs retention; infant mortality vs wear-out).</li>
</ul>

<h2 id="comparing-groups-the-log-rank-test-what-it-is-and-when-it-lies">Comparing groups: the log-rank test (what it is and when it lies)</h2>

<p>To test whether two groups have different survival curves, the <strong>log-rank test</strong> compares observed vs expected events over time under the null that the groups share the same survival distribution. <sup>[<a href="https://en.wikipedia.org/wiki/Logrank_test" title="Log-rank test">logrank-test</a>]</sup></p>

<p>The intuition:</p>

<ul>
  <li>At each event time, if Group A has 60% of the risk set, then under the null it “should” get ~60% of the events.</li>
  <li>The log-rank statistic aggregates how much reality deviates from that expectation across time.</li>
</ul>

<p>When it is useful:</p>

<ul>
  <li>quick sanity check that a cohort difference is not noise</li>
  <li>comparing two versions of a system or product feature rollout</li>
</ul>

<p>When it can mislead:</p>

<ul>
  <li>if the curves cross (effects change over time)</li>
  <li>if observation windows differ systematically between groups (e.g., one region has shorter follow-up windows)</li>
</ul>

<h2 id="cox-proportional-hazards-turning-covariates-into-risk-multipliers">Cox proportional hazards: turning covariates into “risk multipliers”</h2>

<p>The <strong>Cox proportional hazards model</strong> connects covariates to hazard without specifying the baseline hazard shape. <sup>[<a href="https://en.wikipedia.org/wiki/Proportional_hazards_model" title="Proportional hazards model">cox-ph</a>]</sup></p>

\[h(t \mid \mathbf{X}) = h_0(t)\exp(\beta^\top \mathbf{X})\]

<p>How to read it in practice:</p>

<ul>
  <li>$\exp(\beta_j)$ is a <strong>hazard ratio</strong> for a 1-unit increase in $X_j$.</li>
  <li>Hazard ratio &gt; 1 means “riskier” (event happens sooner on average), &lt; 1 means “protective.”</li>
</ul>

<p><strong>Toy interpretation (churn).</strong><br />
If your Cox model yields:</p>

<ul>
  <li>$\exp(\beta_{\text{annual_plan}})=0.7$</li>
</ul>

<p>Then, holding other covariates fixed, annual-plan users have <strong>30% lower instantaneous churn hazard</strong> than monthly-plan users at any given time.</p>

<p><strong>The key assumption:</strong> proportional hazards means those hazard ratios are roughly <strong>constant over time</strong> (i.e., the curves differ by a multiplicative factor in hazard, not by shape). When this is false, you may need:</p>

<ul>
  <li>time-varying covariates/effects</li>
  <li>stratified Cox</li>
  <li>an accelerated failure time (AFT) model for “time scaling” instead of “hazard scaling” <sup>[<a href="https://en.wikipedia.org/wiki/Survival_analysis" title="Survival analysis">survival-analysis</a>]</sup></li>
</ul>

<h2 id="practical-checklist-for-engineers">Practical checklist for engineers</h2>

<p><strong>1) Define observation windows explicitly</strong><br />
Write down, in the experiment doc (not just in code), what “duration” means and when observation stops (end of study, export date, decommission time, trace cutoff).</p>

<p><strong>2) Check whether observation stopping is correlated with the event</strong><br />
Ask simple pipeline questions:</p>

<ul>
  <li>Do some users “disappear” because tracking stops (not because behavior changed)?</li>
  <li>Do some machines leave the dataset because they were proactively replaced (possibly due to warning signs)?</li>
</ul>

<p>If “stopping observation” is correlated with the event, naive analyses can bias survival upward.</p>

<p><strong>3) Start with KM curves before fitting models</strong><br />
KM gives shape intuition: early drop, long tail, crossing hazards, etc. <sup>[<a href="https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator" title="Kaplan–Meier estimator">kaplan-meier</a>]</sup></p>

<p><strong>4) Use Cox when you need covariate-adjusted answers</strong><br />
Examples:</p>
<ul>
  <li>adjust failure risk for load, temperature, and batch</li>
  <li>adjust churn risk for acquisition channel and engagement</li>
</ul>

<p><strong>5) Operationalize outputs</strong><br />
Good time-to-event analysis ends with a decision:</p>
<ul>
  <li>“Which cohort should we target, and when?”</li>
  <li>“What burn-in period reduces infant mortality?”</li>
  <li>“Which covariate is the strongest early-warning signal?”</li>
</ul>

<p>A simple operational trick: evaluate $S(t)$ at business-relevant horizons (e.g., day 1, day 7, day 30) and report those, not just “the curve.”</p>

<h2 id="references">References</h2>

<ul>
  <li>Survival analysis (overview). <sup>[<a href="https://en.wikipedia.org/wiki/Survival_analysis" title="Survival analysis">survival-analysis</a>]</sup></li>
  <li>Kaplan–Meier estimator. <sup>[<a href="https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator" title="Kaplan–Meier estimator">kaplan-meier</a>]</sup></li>
  <li>Log-rank test. <sup>[<a href="https://en.wikipedia.org/wiki/Logrank_test" title="Log-rank test">logrank-test</a>]</sup></li>
  <li>Proportional hazards / Cox model. <sup>[<a href="https://en.wikipedia.org/wiki/Proportional_hazards_model" title="Proportional hazards model">cox-ph</a>]</sup></li>
  <li>Introduction to survival data and incomplete observation windows (video). <sup>[<a href="https://www.youtube.com/watch?v=7_XK7mGMm1E" title="Statistical Learning: 11.1 Introduction to Survival Data and Censoring">survival-video</a>]</sup></li>
</ul>]]></content><author><name>Hongyu Hè</name></author><category term="systems" /><summary type="html"><![CDATA[Survival analysis for systems: censoring, KM curves, log-rank tests, and Cox models]]></summary></entry><entry><title type="html">Connecting Bayesian to Regularization</title><link href="https://hongyuhe.github.io/baysian/" rel="alternate" type="text/html" title="Connecting Bayesian to Regularization" /><published>2022-01-03T00:00:00+00:00</published><updated>2022-01-03T00:00:00+00:00</updated><id>https://hongyuhe.github.io/baysian</id><content type="html" xml:base="https://hongyuhe.github.io/baysian/"><![CDATA[<h2 id="bias-variance-trade-off">Bias-Variance Trade-off</h2>

<p>In many cases, it’s better to trade a bit more bias for a much smaller variance to tighten generalisation error estimates. One way to do so is by adding a regularizer. In linear regression, L1 (Lasso) and L2 (Ridge) regularization are two common options. Lasso suppresses small weights to zero, making the feature space sparse, while Ridge shrinks all weights. Both restrict the norm of the weights and help mitigate overfitting.</p>

<p>However, regularization is only one way to strike the balance. Another way is to introduce more bias into the equation through the Bayesian lens. Specifically, we can impose prior knowledge by adding a prior distribution to constrain the norm of the leant parameters. For example, if we know the weights are small and centred, then we can set our prior to be $\vec{w} \sim \mathcal{N}(0, \beta\textbf{I})$. Then, by Bayes rule, we have:</p>

\[\mathbb{P}(\vec{w} | X, \vec{y}) = \cfrac{\mathbb{P}(\vec{w}, X, \vec{y})}{\mathbb{P}(X, \vec{y})} = \cfrac{\mathbb{P}(\vec{w}, \vec{y} | X) {\mathbb{P}(X)}}{\mathbb{P}(\vec{y} | X) {\mathbb{P}(X)}} = \cfrac{\mathbb{P}(\vec{w}, \vec{y} | X)}{\mathbb{P}(\vec{y} | X)}\]

<p>Thus, both regularization and Bayesian modelling can achieve the same goal, which begs the question: are they connected?</p>

<h2 id="laplace-is-to-lasso-as-gaussian-is-to-ridge">Laplace is to Lasso as Gaussian is to Ridge</h2>

<p>The answer to the above question turns out to be yes! To illustrate this further, let’s use two common prior distributions, Lapace and Gaussian, as our running examples.</p>

<p>Firstly, we assume the following general setting for regression: ${y} = X{\theta}$ and $f_X = y + \epsilon$ where $\theta \sim \text{Laplace}(0, s) = 1/2s \cdot\exp(-\mid\theta\mid / s)$ and $\epsilon \sim \mathcal{N}(0, \delta^2_\epsilon)$.</p>

<p>Then, we obtain the maximum a posteriori (MAP) esitmation as:</p>

\[\begin{align} 
  \arg\max_\theta\mathbb{P}({\theta} | X, {y}) 
  &amp;= \arg\max_\theta\cfrac{\mathbb{P}(y | X, \theta)\mathbb{P}(\theta)} {\mathbb{P}(y)} \nonumber\\
  &amp;\propto \arg\max_\theta\mathbb{P}(y |X, \theta)\mathbb{P}(X| \theta)\mathbb{P}(\theta) \nonumber\\
  &amp;\propto \arg\max_\theta\mathbb{P}(y | X, \theta)\mathbb{P}(\theta) \nonumber\\
  &amp;\propto \arg\max_\theta\mathbb{P}(\theta) \prod^n_i \mathbb{P}(y_i | X_i, \theta) \nonumber\\
  &amp;\propto \arg\min_\theta -\log \mathbb{P}(\theta) - \sum_i^n \log \mathbb{P}_\theta(y_i | X_i) 
\end{align}\]

<p>Next, we can substitute both the likelihood and prior into Eq. (1).</p>

\[\begin{align} 
  \arg\min_\theta -\log\cfrac{1}{2s} \exp\left\{-\cfrac{|\theta|}{s}\right\} - \sum^n_i \log \cfrac{1}{Z} \exp\left\{-\cfrac{1}{2}\left(\cfrac{y_i - f_i}{\delta_\epsilon}\right)^2\right\}
\end{align}\]

<p>where $Z$ is the Gaussian normalising constant. By simplifying Eq. (2), we obtain the following form:</p>

\[\begin{align} 
   &amp; \arg\min_\theta - \cfrac{|\theta|}{s} + \cfrac{1}{2\delta^2_\epsilon} \sum^n_i(y_i - f_i)^2 \\  
  =&amp; \arg\min_\theta - \sum^n_i(y_i - f_i)^2  - \cfrac{2\delta^2_\epsilon}{s}||\theta||_1
\end{align}\]

<p>Now, we have recovered the exact form of Lasso, where $\cfrac{2\delta^2_\epsilon}{s}$ is the coefficient of the L1 regularizor $\lambda$ that controls the strength of the constraint.</p>

<hr />

<p>Next, let’s play the same trick in the same setting but with a Gaussian prior instead, i.e., $\theta \sim \mathcal{N}(0, \delta_\theta^2)$.</p>

<p>Starting from Eq. (1), we subsitute in the likelihood and prior as above:</p>

\[\begin{align} 
  \arg\max_\theta\mathbb{P}({\theta} | X, {y}) 
  &amp;\propto \arg\min_\theta -\log \mathbb{P}(\theta) - \sum_i^n \log \mathbb{P}_\theta(y_i | X_i) \nonumber \\
  &amp;\propto \arg\min_\theta -\log \cfrac{1}{Z'} \exp\left\{-\cfrac{1}{2}\left(\cfrac{\theta-0}{\delta_\theta}\right)^2 \right\} \\ 
  &amp; \quad - \sum^n_i \log \cfrac{1}{Z} \exp\left\{-\cfrac{1}{2}\left(\cfrac{y_i - f_i}{\delta_\epsilon}\right)^2\right\} \nonumber
\end{align}\]

<p>Finally, letting go all the fluff in Eq. (5), we have:</p>

\[\begin{align} 
   &amp; \arg\min_\theta - \cfrac{||\theta||_2^2}{2\delta^2_\theta} + \cfrac{1}{2\delta^2_\epsilon} \sum^n_i(y_i - f_i)^2 \nonumber\\  
  =&amp; \arg\min_\theta - \sum^n_i(y_i - f_i)^2  - \cfrac{\delta^2_\epsilon}{\delta^2_\theta}||\theta||_2^2
\end{align}\]

<p>By Eq. (6), we have recovered Ridge regression where the fraction $\cfrac{\delta_{\epsilon}^2}{\delta_{\theta}^2}$ denotes regularization constant $\lambda$.</p>

<h2 id="summary">Summary</h2>

<p>By working out the above two examples, we found that regularised regression is nothing but Bayesian modelling in disguise. In fact, <em>imposing various priors has the same effect as using corresponding regularizers.</em> By the same token, <em>choosing different likelihoods gives us different loss functions.</em> In this post, we used Gaussian likelihood in both examples, and, in turn, recovered the square loss.</p>

<p>There are many other options for prior and likelihood functions. For instance, one can use a student-t as opposed to Gaussian. Lastly, a family of <a href="https://en.wikipedia.org/wiki/Conjugate_prior">conjugate prior</a> can drastically reduce the cost of Bayesian inference.</p>]]></content><author><name>Hongyu Hè</name></author><category term="ml" /><summary type="html"><![CDATA[Bayesian priors as regularizers: Laplace→L1 (Lasso) and Gaussian→L2 (Ridge)]]></summary></entry><entry><title type="html">Good Reads</title><link href="https://hongyuhe.github.io/books/" rel="alternate" type="text/html" title="Good Reads" /><published>2021-10-08T00:00:00+00:00</published><updated>2021-10-08T00:00:00+00:00</updated><id>https://hongyuhe.github.io/books</id><content type="html" xml:base="https://hongyuhe.github.io/books/"><![CDATA[<p class="notice--info">I’ve started using Goodreads, and you can find me <a href="https://goodreads.com/hongyu">here</a>.</p>

<p class="notice">This is also a rolling log of books I recently read (2021) that are right up my alley.</p>

<h3 class="text-justify" id="bennett-arnold-self-and-self-management-essays-about-existing-george-h-doran-company-1918">Bennett, Arnold. <em>Self and self-management: Essays about existing.</em> George H. Doran Company, 1918.</h3>

<p>The true stories behind the stories of success shall be the same and might not be as glorious.</p>

<h3 class="text-justify" id="hawking-stephen-the-theory-of-everything-jaico-publishing-house-2006">Hawking, Stephen. <em>The theory of everything.</em> Jaico Publishing House, 2006.</h3>

<p>Well, I’m now under the impression that computer science is a pseudoscience :}</p>

<h3 class="text-justify" id="hall-herbert-james-the-untroubled-mind-houghton-mifflin-1915">Hall, Herbert James. <em>The untroubled mind.</em> Houghton Mifflin, 1915.</h3>

<p>How can I live out a life so fully that worries couldn’t sneak in? Perhaps most importantly, what’s my deeper justification and higher pursuit thereof?</p>

<h3 class="text-justify" id="jones-diana-wynne-john-sessions-and-stella-paskins-howls-moving-castle-recorded-books-2008">Jones, Diana Wynne, John Sessions, and Stella Paskins. <em>Howl’s moving castle.</em> Recorded Books, 2008.</h3>

<p>I finally understand why I didn’t quite understand the movie 😅</p>]]></content><author><name>Hongyu Hè</name></author><category term="rollinglog" /><summary type="html"><![CDATA[A rolling list of books I enjoyed, with short notes on why they stuck]]></summary></entry><entry><title type="html">My Take on TED Talks</title><link href="https://hongyuhe.github.io/ted/" rel="alternate" type="text/html" title="My Take on TED Talks" /><published>2021-08-19T00:00:00+00:00</published><updated>2021-08-19T00:00:00+00:00</updated><id>https://hongyuhe.github.io/ted</id><content type="html" xml:base="https://hongyuhe.github.io/ted/"><![CDATA[<p>For the past several months, I’ve picked up the habit of watching TED talks. Even when a talk is impressive, its ideas fade fast. That bothered me, so I started jotting down my main takeaways in this rolling log.</p>

<p class="notice">Some of my friends from China saw a bunch of blanks here. This is because YouTube videos can’t pass the firewall 🧱.</p>

<h2 id="10-ways-to-have-a-better-conversation">10 ways to have a better conversation</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/R1vskiVDwl4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>

<ul>
  <li>If you want to pontificate, go write a blog.</li>
  <li>Ask open-ended questions.</li>
  <li>“No man ever listened his way out of a job” — Calvin Coolidge</li>
  <li>Most of us don’t listen with the intent to understand. We listen with the intent to reply.</li>
  <li>Listen, and be prepared to be amazed.</li>
  <li>If you don’t know, say that you don’t know.</li>
  <li>Don’t equate your experience with others.</li>
</ul>

<h2 id="self-control">Self-control</h2>
<iframe width="560" height="315" src="https://www.youtube.com/embed/PPQhj6ktYSo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>

<p class="notice--warning">This Ted talk conveys very similar messages as that of the below one. Their main ideas are covered in many other talks as well.</p>

<h2 id="to-reach-beyond-your-limits-by-training-your-mind">To reach beyond your limits by training your mind</h2>
<iframe width="560" height="315" src="https://www.youtube.com/embed/zCv-ZBy6_yU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>

<ul>
  <li>Self-control is the problem where we have all these desires from ourselves for the long-term, but then in the short-term, we do rather different things (that prevent ourselves from achieving long-term those goals).</li>
  <li>Our will power is weak, and therefore, it’s not something upon which we should rely during decision-making.</li>
  <li>If we are faced with temptation whilst having no tool at hand to overcome it, we’re almost certainly going to fail.</li>
  <li>We should collaborate with our brains with constructive messages.
    <ul>
      <li>Change the pictures and the words. Using very detailed words.</li>
      <li>Tell your mind exactly what you want.</li>
    </ul>
  </li>
  <li>We ought to create tools that will control our future selves to do what our current selves want them to do.
    <ul>
      <li>It’s a situation where we know we will be tempted, and we do something to make ourselves not be able to be tempted.</li>
    </ul>
  </li>
  <li>Make the familiar unfamiliar and the unfamiliar familiar.</li>
  <li>Reward-substitution: connecting pain to pleasure
    <ul>
      <li>Link massive pleasure to going there and pain to not going there.</li>
      <li>Do the right thing for the wrong reason.</li>
    </ul>
  </li>
  <li>Make Self Belief so normal to you that everyone believes in you too.</li>
</ul>

<p class="notice--info">The last point ties in with another Ted talk, which will be introduced later.</p>

<h2 id="body-language-the-power-is-in-the-palm-of-your-hands">Body language, the power is in the palm of your hands</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/ZZZ7k8cMA-4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>

<ul>
  <li>The palm up position is friendly and inviting, whilst the palm down position exerts power and control over others.
    <ul>
      <li>Finger pointing is the worst; it is directive and rude.</li>
    </ul>
  </li>
</ul>

<h2 id="imposters-the-psychology-of-pretending-to-be-someone-youre-not">Imposters: The psychology of pretending to be someone you’re not</h2>
<iframe width="560" height="315" src="https://www.youtube.com/embed/vSjlCJaEwZE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>

<ul>
  <li>Some imposters are escapologists — they’re running away from a flawed past and trying to rehabilitate their image.</li>
  <li>We spend a hell of a lot of time trying to impress other people.</li>
  <li>Ultimately, life is a performance with all the sense of drama, anarchy, and possibility.</li>
  <li>Who ever controls the past controls the future; who ever controls the present controls the past. — Orwell</li>
</ul>

<h2 id="how-to-draw-to-remember-more">How to draw to remember more</h2>
<iframe width="560" height="315" src="https://www.youtube.com/embed/gj3ZnKlHqxI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>

<ul>
  <li>Thinking in pictures and then drawing them down is a great way to learn and memorize new things.</li>
  <li>There are a myriad of ways to represent an abstract concept using drawings.</li>
  <li>The quality of the drawing does not matter at all. In other words, good and bad pictures have rather similar, if not the same, effect on the learning process.</li>
  <li>It is the process of doing the drawing that actually makes a difference.</li>
</ul>

<p class="notice--success">To be continued 👨‍💻 …</p>]]></content><author><name>Hongyu Hè</name></author><category term="rollinglog" /><summary type="html"><![CDATA[A rolling log of TED takeaways on conversation, self-control, imposters, and memory]]></summary></entry><entry><title type="html">SSH Proxy Jump</title><link href="https://hongyuhe.github.io/ssh/" rel="alternate" type="text/html" title="SSH Proxy Jump" /><published>2020-08-11T00:00:00+00:00</published><updated>2020-08-11T00:00:00+00:00</updated><id>https://hongyuhe.github.io/ssh</id><content type="html" xml:base="https://hongyuhe.github.io/ssh/"><![CDATA[<p>This post is concerned with the basics of SSH authentication, as well as its indirect login via a proxy server.</p>

<figure style="width: 90%" class="align-center">
  <img src="https://hongyuhe.github.io/assets/images/ssh.png" alt="" />
  <!-- <figcaption>System overview</figcaption> -->
</figure>

<h2 id="step-1-key-pair-generation">Step 1: key pair generation</h2>

<p>Use the following command to generate a public (silver) / private (black) RSA key pair under the <code class="language-plaintext highlighter-rouge">~/.ssh/id_rsa</code> directory. The <code class="language-plaintext highlighter-rouge">.ssh/id_rsa</code> is the private key you keep on your machine, and <code class="language-plaintext highlighter-rouge">.ssh/id_rsa.pub</code> is the public key you distribute to other machines to enable passwordless logins.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh-keygen <span class="nt">-t</span> rsa
</code></pre></div></div>

<p class="notice--info">Note that git uses a different type of cryptosystem, namely, the Ed25519 system, which can be generated using the following command.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh-keygen <span class="nt">-t</span> ed25519 <span class="nt">-C</span> <span class="s2">"youremail@yourdomain"</span>
</code></pre></div></div>

<p>Compared to RSA, it is considered to be faster, safer and more compact (Ed25519:  8chars, RSA: 544chars) although RSA is more commonly used.</p>

<h2 id="step-2-distribute-keys">Step 2: distribute key(s)</h2>

<p>If <em>Alice</em> wants to log in <em>Server1</em> shown in the figure, she can run the following command to forward her SSH <strong>public key(s)</strong> to it. The keys sent will be recorded in <code class="language-plaintext highlighter-rouge">.ssh/authorized_keys</code> of the host.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh-copy-id alice-username@server1.domain-or-ip
</code></pre></div></div>

<p>Afterwards, <em>Alice</em> should be able to log in <em>Server1</em> without being asked for her password. I.e.,</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh alice-username@server1.domain-or-ip

Welcome to XXX ...
</code></pre></div></div>

<p class="notice--warning">Note that for Mac users that do not have <code class="language-plaintext highlighter-rouge">ssh-copy-id</code>, you can either intall it via <code class="language-plaintext highlighter-rouge">brew</code> or mannually copy these ssh files through <code class="language-plaintext highlighter-rouge">scp</code>, <code class="language-plaintext highlighter-rouge">rsync</code> or whatnot. If you go for the latter, one thing you should keep in mind is to set the permission bits correctly as shown below.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">chmod </span>700 ~/.ssh
<span class="nb">chmod </span>600 ~/.ssh/<span class="k">*</span>
</code></pre></div></div>

<h2 id="step-3-indirect-login">Step 3: indirect login</h2>

<p>In the case that <em>Server1</em> has a firewall or what have you, <em>Alice</em> has to connect it via a proxy, say <em>Server2</em>; therein lies the question: how to access <em>Server2</em> directly using key-pair authentication?</p>

<p>To tackle this, <em>Alice</em> can first forward her/his keys to <em>Server2</em>, the proxy, through <strong>Step 2</strong>. Next, <em>Alice</em> should log in <em>Server2</em> to generate a key pair (<strong>Step 1</strong>), and then, send keys to <em>Server1</em>.</p>

<p>Last, but certainly not least, <em>Alice</em> should set up her <code class="language-plaintext highlighter-rouge">ssh</code> on her local machine. The configuration (<code class="language-plaintext highlighter-rouge">~/.ssh/config</code>) is along the lines of the following.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Host server2
    HostName          server1-domain
    User              alice-username
    IdentityFile      ~/.ssh/id_rsa

Host server1
    HostName          server1-domain
    User              alice-username
    ForwardX11Trusted <span class="nb">yes
    </span>ForwardAgent      <span class="nb">yes
    </span>IdentityFile      ~/.ssh/id_rsa
    ProxyCommand ssh server2 <span class="nt">-W</span> %h:%p
</code></pre></div></div>

<p>Now, everything should be in place. <em>Alice</em> should be able to do login, port forwarding, or whatnot, with automatic key-pair authentication (without having to type her password <strong>every single time</strong> for <strong>every single server along the way</strong>!).</p>

<p class="notice--danger">🛑 NB: Data loads between <em>Server1</em> and <em>Server2</em> is not encrypted.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Log in *Server1* via *Server2*.</span>
ssh server1 

<span class="c"># Tunnelling ports to *Server1* via *Server2*.</span>
ssh <span class="nt">-NL</span> <span class="o">{</span>listening_port<span class="o">}</span>:<span class="o">{</span>hostmachine<span class="o">}</span>:<span class="o">{</span>host_port<span class="o">}</span> server1 
</code></pre></div></div>

<p>p.s. Wrestling with company proxies during these COVID times we live in at the moment can be devastating 😷</p>]]></content><author><name>Hongyu Hè</name></author><category term="systems" /><category term="notes" /><summary type="html"><![CDATA[SSH key auth and proxy jumps: keygen, key distribution, and practical config patterns]]></summary></entry><entry><title type="html">Autoencoders and VAEs</title><link href="https://hongyuhe.github.io/autoencoders/" rel="alternate" type="text/html" title="Autoencoders and VAEs" /><published>2020-02-15T00:00:00+00:00</published><updated>2020-02-15T00:00:00+00:00</updated><id>https://hongyuhe.github.io/autoencoders</id><content type="html" xml:base="https://hongyuhe.github.io/autoencoders/"><![CDATA[<h2 id="1-autoencoders">1 Autoencoders</h2>

<p><img src="https://hongyuhe.github.io/_resources/ebe53b3973334dbcb66cef074485f5d1.png" alt="60bc1ab0a0006196017f2d3cf12edf4e.png" /></p>

<p>The main idea of autoencoders is to extract latent features that are not easily observable yet play an important role in one or several aspects of the data (e.g., images).</p>

<p><img src="https://hongyuhe.github.io/_resources/f59db3fa307d465299e34dd02ddd056f.png" alt="010ba1fa8ed15183bffb974199a389c3.png" /></p>

<figure class="align-center">
  <img src="https://hongyuhe.github.io/_resources/a1aa34b06a484b4f90d9dc2f1010af90.png" alt="" />
  <figcaption>Embedding of faces [Saul &amp; Roweis]</figcaption>
</figure>
<!-- ![fa1b75f261850b8706e4b731ed2fd55d.png](https://hongyuhe.github.io/_resources/a1aa34b06a484b4f90d9dc2f1010af90.png) -->

<h3 id="11-compression-by-the-encoder">1.1 Compression (by the “encoder”)</h3>

<p><img src="https://hongyuhe.github.io/_resources/b557ab08ab414bceac4a76f45b6af308.png" alt="4943cd912dd88cc5f1f82d85d73ca482.png" /></p>

<p>The first step of the process is to compress the observed data vector $\vec x$ into the latent feature vector $\vec z$.</p>

<p>There are two obvious benefits yielded from such compression process:</p>

<ol>
  <li>The latent feature vector $\vec z$ is much smaller, which makes it much easier to process than the original (potentially high-dimensional) data.</li>
  <li>As its name suggests, the latent feature vector $\vec z$ may capture important hidden features. </li>
</ol>

<h3 id="12-reconstruction-by-the-decoder">1.2 Reconstruction (by the “decoder”)</h3>

<p><img src="https://hongyuhe.github.io/_resources/35e7be83efe74e23b98db75467b84c42.png" alt="f47f300caacbfbfd0b3d76705832f118.png" /></p>

<p>The second phase is to try to reproduce the data (the image) from the latent feature vector $\vec z$. </p>

<p>Apparently, since the first step is a “lossy compression”, the data reconstructed $\vec{\hat{x}}$ will not be exactly the same as the original observation. Here is where the third phase comes about.</p>

<h3 id="13-backpropagation">1.3 Backpropagation</h3>

<p>As mentioned above, there is a difference between the observation $\vec{x}$ and the reconstruction $\vec{\hat{x}}$.</p>

<p><img src="https://hongyuhe.github.io/_resources/58b3571b434f4ac09f0d65ba672836d6.png" alt="00c472c860f7f276f01d91f7ff4bade2.png" /></p>

<p>From the above picture, we can see clearly that the higher the dimension of the latent feature vector $\vec{z}$, the higher the quality of the reconstruction.</p>

<p>Therefore, constraining the size of the latent space will enforce the “importance” of the extracted features. </p>

<p>Further, we can use a loss function to measure such “importance” of the extracted hidden variables. In this case, we use a simple square loss:</p>

\[\mathcal{L}(x, \hat{x})=\|x-\hat{x}\|^{2}\]

<p>Thus, the key power of autoencoders is that</p>

<p><strong>Autoencoder allows us to quantify the latent variables without labels (gold-standard data)!</strong></p>

<p>To summarize, </p>

<ul>
  <li>Autoencoding == <strong>Auto</strong>matically <strong>encoding</strong> data</li>
  <li>Bottleneck hidden layer forces the network to learn a compressed latent representation.</li>
  <li>Reconstruction loss forces the latent representation be as “paramount” and “informative” as possible.</li>
</ul>

<h2 id="2-variational-autoencoders-vaes">2 Variational Autoencoders (VAEs)</h2>

<p><img src="https://hongyuhe.github.io/_resources/739dc342d3714d96a2b1d39e5eb07f7c.png" alt="5bfdc446d533a2f9a737ddea851534d1.png" /></p>

<h3 id="21-stochastical-variation">2.1 Stochastical variation</h3>

<figure style="width: 30%" class="align-left">
  <img src="https://hongyuhe.github.io/_resources/b4cd6671e16e4d0fb188b81a0a31b45c.png" alt="" />
  <!-- <figcaption>System overview</figcaption> -->
</figure>
<p>In a nutshell, variational autoencoders are a probabilistic twist on autoencoders, i.e. (stochastically) sample from the mean and standard deviation to compute the latent sample as supposed to deterministically take the entire latent vector $\vec{z}$. That been said, the main idea of the forward propagation does not change compared to traditional autoencoders. </p>

<ul>
  <li>In the compression process, the encoder computes $p_{\phi}(\mathrm{z} \mid x)$.</li>
  <li>In the reconstruction phase, the decoder computes $q_{\theta}(\mathrm{x} \mid z)$.</li>
</ul>

<p>Then, we could compute the loss as follows</p>

\[\mathcal{L}(\phi, \theta, x)=(\text { reconstruction loss })+(\text { regularization term }),\]

<p>which is exactly the same as before. It captures the pixel-wise difference between the input and the reconstructed output. This is a metrics of how well the network is doing at generating the distribution that akin to that of the observation.</p>

<p>As to the “regularization term”, since the VAE is producing these probability distributions, we want to place some constraints on how they are computed as well as what that probability distribution resembles as a part of regularizing and training the network.</p>

<p>Hence, we place a prior $p(z)$ on the latent distribution as follows</p>

\[D(p_{\phi}(z|x)\ ||\ p(z)),\]

<p>which captures the KL divergence between the inferred latent distribution and this fixed prior for which a common choice is a normal Gaussian, i.e. we centre it around with a mean of 0 and a standard deviation 1: $\ p(z)=\mathcal{N}\left(\mu=0, \sigma^{2}=1\right)$.</p>

<p>In this way, the network will learn to penalise itself when it tries to cheat and cluster points outside sort of this smooth Gaussian distribution as it would be the case if it was overfitting or trying to memorize particular instances of the input.</p>

<p>Thus, this will enforce the extracted $\vec z$ follows the shape of our initial hypothesis about the distribution, smoothing out the latent space and, in turn, helping the network not over-fit on certain parts of the latent space.</p>

<h3 id="22-backpropagation-reparametrization">2.2 Backpropagation? Reparametrization</h3>

<figure style="width: 47%" class="align-center">
  <img src="https://hongyuhe.github.io/_resources/c44854a3bb34468b9744e8ae8b5011f4.png" alt="" />
  <figcaption>Original form</figcaption>
</figure>

<p>Unfortunately, due to the stochastic nature, the backpropagation cannot pass the sampling layer as backpropagation requires deterministic nodes to be able to iteratively pass gradients and apply the chain rule through.</p>

<figure style="width: 53%" class="align-center">
  <img src="https://hongyuhe.github.io/_resources/0c92b033a4db47db89be4f4713492eeb.png" alt="" />
  <figcaption>Reparametrized form</figcaption>
</figure>

<!-- ![1512d4ff467d25ef402352d65afa8f15.png](https://hongyuhe.github.io/_resources/0c92b033a4db47db89be4f4713492eeb.png) -->

<p>Instead, we consider the sampled latent vector $\vec z$ as a sum of a fixed vector $\vec \mu$ a fixed variance vector $\vec \sigma$ and then scaled this variance vector by a random constant that is drawn from a prior distribution, for example from a normal Gaussian. The key idea here is that we still have a stochastic node but since we have done this reparametrization with the factor $\epsilon$ that is drawn from a normal distribution, this stochastic sampling does not occur directly in the bottleneck layer of $\vec z$. This way, we can reparametrize where that sampling is occurring.</p>

<p>Note that this is a really powerful trick as such reparametrization is what allows for VAEs to be trained end-to-end.</p>

<h2 id="3-code-example">3 Code example</h2>

<p>The following is a vanila implementation of a VAE model in Tensorflow.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Sampling</span><span class="p">(</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="n">Layer</span><span class="p">):</span>
  <span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">)</span> <span class="p">:</span>
  <span class="n">z_mean</span><span class="p">,</span> <span class="n">z_log_var</span> <span class="o">=</span> <span class="n">inputs</span>
  <span class="n">batch</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">shape</span><span class="p">(</span><span class="n">z_mean</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
  <span class="n">dim</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">shape</span><span class="p">(</span><span class="n">z_mean</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
  <span class="n">epsilon</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">backend</span><span class="p">.</span><span class="n">random_normal_</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">batch</span><span class="p">,</span> <span class="n">dim</span><span class="p">))</span>
  <span class="k">return</span> <span class="n">z_mean</span> <span class="o">+</span> <span class="n">tf</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="mf">0.5</span> <span class="o">*</span> <span class="n">z_log_var</span><span class="p">)</span>

<span class="n">latenet_dim</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">encoder_inputs</span> <span class="o">=</span> <span class="n">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">6</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">"input_layer"</span><span class="p">)</span>

<span class="n">X</span> <span class="o">=</span> <span class="n">Dense</span> <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"relu"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"h1"</span><span class="p">)(</span><span class="n">encoder_inputs</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">Dense</span> <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"relu"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"h2"</span><span class="p">)(</span><span class="n">x</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">Dense</span> <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"relu"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"h3"</span><span class="p">)(</span><span class="n">x</span><span class="p">)</span>
<span class="n">z_mean</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="n">latent_dim</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"z_mean"</span><span class="p">)(</span><span class="n">x</span><span class="p">)</span>
<span class="n">z_log_var</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="n">latent_dim</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"z_log_var"</span><span class="p">)(</span><span class="n">x</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">Sampling</span><span class="p">()([</span><span class="n">z_mean</span><span class="p">,</span> <span class="n">z_log_var</span><span class="p">])</span>

<span class="n">encoder</span> <span class="o">=</span> <span class="n">keras</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="n">encoder_inputs</span><span class="p">,</span> <span class="p">[</span><span class="n">z_mean</span><span class="p">,</span> <span class="n">z_log_var</span><span class="p">,</span> <span class="n">z</span><span class="p">],</span> <span class="n">name</span><span class="o">=</span><span class="s">"encoder"</span><span class="p">)</span>

<span class="n">keras</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">plot_model</span><span class="p">(</span><span class="n">encoder</span><span class="p">,</span> <span class="n">show_shapes</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>]]></content><author><name>Hongyu Hè</name></author><category term="ml" /><category term="notes" /><summary type="html"><![CDATA[Autoencoders vs VAEs: latent compression, KL regularization, and reparameterization]]></summary></entry></feed>