Link Search Menu Expand Document

ECE 901 @ UW Madison — Advanced Topics in Large Language Models

1) Training LLMs

1-1) Pretraining

1‑1‑1) Architecture

Week 1‑1: DeepSeek-V2/3

Week 1‑2: MoE and Multi‑token Prediction

Week 2‑1: Positional Encodings and Long Context.

Week 2‑2: LayerNorm & RMSNorm. Wrap up with gpt-oss.

Week 3‑1: Is “Decoder TF + left-to-right autoregressive” the end of the story? Part 1.

Week 3‑2: Is “Decoder TF + left-to-right autoregressive” the end of the story? Part 2.


1‑1‑2) Training Data

Week 4‑1: How much data (and compute)?

Week 4‑2: Which Data?


1‑1‑3) Training Algorithms

Week 5‑1: Optimizers

Week 5‑2: Newer Optimizers

Week 6‑1: Optimizer Benchmarks

Week 6‑2: Efficient Training


1‑2) Posttraining

Week 7‑1: Alignment‑Focused Post‑training

Week 7‑2: RL of LLMs

Week 8‑1: RL of LLMs

Week 8‑2: Reasoning‑Focused Post‑training


2) Using LLMs

Week 9‑1: LLMs + Tools

Week 9‑2: System‑Level Optimization

Week 10‑1: Attention and Serving

Week 10‑2: Quantization

Week 11‑1: Exact Acceleration

Week 11‑2: Approximate Inference and KV Policies


3) Adapting LLMs

Week 12-1: PEFT

  • LoRA: Low‑rank adapters enable parameter‑efficient finetuning with minimal latency cost.
  • DoRA: Magnitude‑direction decomposition improves LoRA’s capacity without runtime overhead.
  • Expressive Power of LoRA: Theory on when low‑rank adapters can approximate target functions in Transformers.
  • LoRA Training Provably Converges…: Convergence guarantees and clear failure modes in practical regimes.
  • (optional) QLoRA

Week 12-2: In‑Context Learning

Week 13-1: Continual Adaptation via Prompt Evolution

Week 13-2: Final Poster Presentation