LIBRISTO
LIBROAMANTO
povinné
Staňte sa súčasťou komunity milovníkov kníh z celého sveta a získajte hromadu výhod. Založiť účet zdarma
0
Doprava zadarmo s Packetou nad 59.99 €
Kuriér DPD 2.99 SPS 3.99 SPS Parcel Shop 2.99 Packeta kurýr 3.99 Pošta 3.99 Zberné miesto DPD 2.99 Zberné miesto DPD 0.00 Kuriér GLS 3.99 Zberné miesto GLS 2.49 Packeta 2.99

Doprava zdarma pre objednávky nad 59,99 € s Packetou a SPS Boxmi.

Local LLM Inference Optimization

A Comprehensive Guide to Quantization, Hardware Acceleration, and Efficient Private AI Deployment

Jazyk AngličtinaAngličtina
Kniha Brožovaná
Kniha Local LLM Inference Optimization Thomas O. Greene
Libristo kód: 52120727
Nakladateľstvo Independently published, apríl 2026
Stop Renting Intelligence. Start Optimizing Your Own.Do you want to run 70B parameter models on a si... Celý popis
? points 38 b Nové Nové
15.79
Skladom u dodávateľa Odosielame za 9-15 dní

30 dní na vrátenie tovaru

Stop Renting Intelligence. Start Optimizing Your Own.
Do you want to run 70B parameter models on a single consumer GPU? Are you tired of high API costs, network latency, and the privacy risks of cloud-based AI?
The "Local LLM Revolution" is here, but running Large Language Models (LLMs) privately is only half the battle. To make them truly useful, you must master Inference Optimization.
In Local LLM Inference Optimization, you will move beyond basic "out-of-the-box" setups and dive into the high-performance engineering required to squeeze every drop of power from your hardware. Whether you are using NVIDIA CUDA, Apple Silicon (MLX), or AMD ROCm, this comprehensive guide provides the technical blueprint for the sovereign engineer.

What You Will Master:

  • The Quantization Deep-Dive: Learn to navigate the "Quantization Tax" using GGUF, EXL2, AWQ, and GPTQ. Move from FP32 to 4-bit and even 1.58-bit (BitNet) without losing the model's "mind."
  • Advanced Memory Management: Defeat "Out of Memory" (OOM) errors by mastering KV Cache Management, PagedAttention, and FlashAttention 2 & 3.
  • The Speed Multipliers: Double your Tokens Per Second (TPS) using Speculative Decoding, Continuous Batching, and Lookahead Heuristics.
  • Hardware Architecture: Architect high-performance local servers using Multi-GPU Pipeline Parallelism and CPU/GPU offloading strategies.
  • Context Window Expansion: Use RoPE Scaling, YaRN, and LongRoPE to push 8k models to 128k+ context on consumer hardware.
  • The Full Local Stack: Step-by-step guides for Llama.cpp, Ollama, vLLM, and TGI (Text Generation Inference).
  • Security & Privacy: Deploy Air-Gapped AI environments and secure your infrastructure using Safetensors and local sandboxing.
Why This Book?
This book focuses on Deployment and Efficiency. It is written for the Lead Engineer, the Privacy-Conscious CTO, and the Prosumer Hobbyist who demands low Time to First Token (TTFT) and maximum Perf/Watt.
Stop paying for tokens. Own your weights. Optimize your future.

Herečka & Polyglotka
EWA KASP pre
Prehrať video
Ewa Kasp
Libristo má najväčší výber cudzojazyčnej literatúry. Preto si knihy kupujem tu.

Informácie o knihe

Celý názov Local LLM Inference Optimization
Jazyk Angličtina
Väzba Kniha - Brožovaná
Dátum vydania 2026
Počet strán 170
EAN 9798258375193
Libristo kód 52120727
Nakladateľstvo Independently published
Váha 237
Rozmery 152 x 229 x 9
Darujte túto knihu ešte dnes
Je to jednoduché
1 Pridajte knihu do košíka a vyberte možnosť doručiť ako darček 2 Obratom Vám zašleme poukaz 3 Knihu zašleme na adresu obdarovaného

Prihlásenie

Prihláste sa k svojmu účtu. Ešte nemáte Libristo účet? Vytvorte si ho teraz!

 
povinné
povinné

Nemáte účet? Získajte výhody Libristo účtu!

Vďaka Libristo účtu budete mať všetko pod kontrolou.

Vytvoriť Libristo účet
Knižný radca Libroamiko
Ahoj, som Libroamiko, môžem pomôcť?