List of things I want to pick up at some time:

ray-project/llm-numbers: Numbers every LLM developer should know

Achieve 23x LLM Inference Throughput & Reduce p50 Latency

LLM Inference Optimizations — Continuous Batching (Dynamic Batching) and Selective Batching, Orca | by Don Moon | Byte-Sized AI | Medium

Generation with LLMs

serodriguez68/designing-ml-systems-summary: A detailed summary of "Designing Machine Learning Systems" by Chip Huyen. This book gives you and end-to-end view of all the steps required to build AND OPERATE ML products in production. It is a must-read for ML practitioners and Software Engineers Transitioning into ML.

LLM Inference Optimizations - Chunked Prefills and Decode Maximal Batching | by Don Moon | Byte-Sized AI

LLM Inference Series: 2. The two-phase process behind LLMs’ responses | by Pierre Lienhart | Medium

Large Scale Transformer model training with Tensor Parallel (TP) — PyTorch Tutorials 2.5.0+cu124 documentation

LLM Inference Series: 4. KV caching, a deeper look | by Pierre Lienhart | Medium