Link Search Menu Expand Document

Paper List

LLM 901 — Weekly Reading Schedule with TL;DRs

1) Training LLMs

1-1) Pretraining

1‑1‑1) Architecture

Week 1‑1 (09/08/2025): DeepSeek-V2/3

Week 1‑2 (09/12/2025): MoE and Multi‑token Prediction

Week 2‑1 (09/15/2025): Positional Encodings and Long Context.

Week 2‑2 (09/19/2025): LayerNorm & RMSNorm. Wrap up with gpt-oss.

Week 3‑1 (09/22/2025): Is “Decoder TF + left-to-right autoregressive” the end of the story? Part 1.

Week 3‑2 (09/26/2025): Is “Decoder TF + left-to-right autoregressive” the end of the story? Part 2.


1‑1‑2) Training Data

Week 4‑1 (09/29/2025): How much data (and compute)?

Week 4‑2 (10/03/2025): Which Data?


1‑1‑3) Training Algorithms

Week 5‑1 (10/06/2025): Optimizers

Week 5‑2 (10/10/2025): Newer Optimizers

Week 6‑1 (10/13/2025): Optimizer Benchmarks

Week 6‑2 (10/17/2025): Efficient Training


1‑2) Posttraining

Week 7‑1 (10/20/2025): Alignment‑Focused Post‑training

Week 7‑2 (10/24/2025): RL of LLMs

Week 8‑1 (10/27/2025): RL of LLMs

Week 8‑2 (10/31/2025): Reasoning‑Focused Post‑training


2) Using LLMs

Week 9‑1 (11/03/2025): LLMs + Tools

Week 9‑2 (11/07/2025): System‑Level Optimization

Week 10‑1: Attention and Serving

Week 10‑2: Quantization

Week 11‑1: Exact Acceleration

  • Speculative Sampling: Draft‑and‑verify decoding gives ~2× speedups while keeping the target model’s distribution.
  • Medusa: Extra decoding heads propose and verify multi‑token candidates in one step.
  • EAGLE

Week 11‑2: Approximate Inference and KV Policies


3) Adapting LLMs

Week 12-1 (skipped): PEFT

  • LoRA: Low‑rank adapters enable parameter‑efficient finetuning with minimal latency cost.
  • DoRA: Magnitude‑direction decomposition improves LoRA’s capacity without runtime overhead.
  • Expressive Power of LoRA: Theory on when low‑rank adapters can approximate target functions in Transformers.
  • LoRA Training Provably Converges…: Convergence guarantees and clear failure modes in practical regimes.

Week 12-2 (skipped): In‑Context Learning

Week 13-1: Continual Adaptation via Prompt Evolution

Week 13-2: Final Poster Presentation