List of things I want to pick up at some time:
ray-project/llm-numbers: Numbers every LLM developer should know
Achieve 23x LLM Inference Throughput & Reduce p50 Latency
LLM Inference Optimizations — Continuous Batching (Dynamic Batching) and Selective Batching, Orca | by Don Moon | Byte-Sized AI | Medium
Generation with LLMs
serodriguez68/designing-ml-systems-summary: A detailed summary of "Designing Machine Learning Systems" by Chip Huyen. This book gives you and end-to-end view of all the steps required to build AND OPERATE ML products in production. It is a must-read for ML practitioners and Software Engineers Transitioning into ML.
LLM Inference Optimizations - Chunked Prefills and Decode Maximal Batching | by Don Moon | Byte-Sized AI
LLM Inference Series: 2. The two-phase process behind LLMs’ responses | by Pierre Lienhart | Medium
Large Scale Transformer model training with Tensor Parallel (TP) — PyTorch Tutorials 2.5.0+cu124 documentation
LLM Inference Series: 4. KV caching, a deeper look | by Pierre Lienhart | Medium