Yonghai Gong

Hello friends, welcome to my website.

This is mainly for recording my learning notes, sharing my thoughts and experiences, and connecting with like-minded individuals.

Feel free to explore the different sections of my website and don't hesitate to reach out if you have any questions or would like to connect.

Blog Posts

Meta Llama family

Description of the construction of Meta Llama family models

Efficient LLM inference (Episode 1)

On tackling long sequence

A classical tool for computing block-wise attention

Jun 5, 2025

LLM

Technique

Overview of Large Model Lightweighting Techniques

Model Context Protocol (MCP)

A brief introduction of Model Context Protocol

May 28, 2025

MCP

LLM

Understanding transformer—Statistics

On the storage and computation overhead

Understanding transformer—KV cache

Accelerating inference by using KV cache

Understanding transformer

refer to "Attention is all you need." Vaswani, Ashish, et al. Advances in NeurIPS 30 (2017).

This is the basis of operator theory (from Stephen Boyd, course EE364b, Stanford University)