Doprava zdarma pre objednávky nad 59,99 € s Packetou a SPS Boxmi.

Zistiť stav objednávky

Staňte sa súčasťou komunity milovníkov kníh z celého sveta a získajte hromadu výhod. Založiť účet zdarma

Doprava zadarmo s Packetou nad 59.99 €

Kuriér DPD 2.99 € Zberné miesto GLS 2.99 € SPS 3.99 € Kuriér GLS 3.49 € SPS Parcel Shop 2.99 € Packeta kurýr 3.99 € Pošta 3.99 € Zberné miesto DPD 2.99 € Zberné miesto DPD 0.00 € Packeta 2.99 €

Kontakt

Ako nakupovať

Pomoc

Môj účet

▸ Prázdny :-(

Doprava zdarma pre objednávky nad 59,99 € s Packetou a SPS Boxmi.

AI Inference Optimization Engineering

Name: AI Inference Optimization Engineering
Brand: Independently published
SKU: 52770465
Price: 10.8 EUR
Availability: InStock
Author: ChatVariety Team
ISBN: 9798199720021

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

ChatVariety Team

Jazyk

Angličtina

Kniha Brožovaná

Libristo kód: 52770465

Nakladateľstvo Independently published, jún 2026

Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a mass... Celý popis

Libristo kód: 52770465

26 b

Pripravujeme Nové

Nové

10.80 €

Očakávané naskladnenie Naskladnenie 07. 06. 2026

30 dní na vrátenie tovaru

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:

Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Herečka & Polyglotka

EWA KASP pre

Prehrať video

Libristo má najväčší výber cudzojazyčnej literatúry. Preto si knihy kupujem tu.

Informácie o knihe

Celý názov AI Inference Optimization Engineering

Autor ChatVariety Team

Jazyk

Angličtina

Väzba Kniha - Brožovaná

Dátum vydania 2026

Počet strán 96

EAN 9798199720021

Libristo kód 52770465

Nakladateľstvo Independently published

Váha 142

Rozmery 152 x 229 x 5

Kategórie

Výpočtové a informačné technológie > Informatika > Umelá inteligencia > Prirodzený jazyk a strojový preklad

Darujte túto knihu ešte dnes

Je to jednoduché

1 Pridajte knihu do košíka a vyberte možnosť doručiť ako darček 2 Obratom Vám zašleme poukaz 3 Knihu zašleme na adresu obdarovaného

Často hľadané

Categories

Authors

Publishers

Často hľadané

Tovar

Categories

Authors

Publishers

Doručenie

Nákupný poradca

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Informácie o knihe

Kategórie

Darujte túto knihu ešte dnes

Je to jednoduché

Často hľadané

Categories

Authors

Publishers

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Informácie o knihe

Kategórie

Darujte túto knihu ešte dnes

Je to jednoduché

Nemáte účet? Získajte výhody Libristo účtu!