Quantization in LLMs - Search Videos

Local LLMs on Consumer Hardware: GLM-4.7-Flash Performance | Hammad Armghan, PhD posted on the topic | LinkedIn

Local LLMs on Consumer Hardware: GLM-4.7-Flash Performance | Ham…

1 views1 month ago

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writing a poem on LLMs at 6bit quantization! 🔥 Let's start some coding, context and distributed tests! Generation: 40.2 tokens-per-sec Peak memory: 186 GB Source: Ivan Fioravanti | Thanh Hoang

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writin…

1.1K views1 month ago

FacebookThanh Hoang

What is Quantization? | IBM

What is Quantization? | IBM

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16)

LLMs can take gigabytes of memory to store, which limits what can be …

6.8K viewsApr 15, 2024

FacebookAndrew Ng

"Fine-tuning LLMs on AMD Strix Halo with Framework Desktop" | Donato Capitella posted on the topic | LinkedIn

"Fine-tuning LLMs on AMD Strix Halo with Framework Desktop" | D…

[IDSL Seminar'25] M-ANT: Efficient Low-bit Group Quantization for LLMs

[IDSL Seminar'25] M-ANT: Efficient Low-bit Group Quantization for LL…

20 views3 months ago

What Is Quantization | Quantization | TensorTeach

What Is Quantization | Quantization | TensorTeach

315 viewsNov 20, 2024

YouTubeTensorTeach

Understanding Symmetric Quantization | Quantization | Tens…

276 viewsNov 20, 2024

YouTubeTensorTeach

SmoothQuant

4.4K viewsOct 25, 2023

YouTubeMIT HAN Lab

LLM Distillation and Compression

558 viewsDec 17, 2024

YouTubeMLOps.community

Host a AI Server

453 viewsMar 27, 2024

YouTubeAI Arcade

What is LLM Quantization ?

3K viewsMar 19, 2025

YouTubeNew Machina

LLMs Naming Convention Explained

1.8K viewsSep 15, 2023

YouTubeAI Readme

LLMs On The Edge

1.6K views9 months ago

YouTubeSemiconductor Engineering

Optimize Your AI - Quantization Explained

406.9K viewsDec 28, 2024

YouTubeMatt Williams

LLM Explained | What is LLM

405.5K viewsAug 22, 2023

YouTubecodebasics

What is LLM quantization?

25.6K viewsNov 6, 2023

YouTubeAirtrain AI

MR-GPTQ: Better FP4 Microscaling for LLMs

115 views5 months ago

YouTubeAI Research Roundup

Quantization in Deep Learning (LLMs)

11.5K viewsSep 22, 2023

YouTubeAI Bites

BitNet Distillation: 1.58‑bit LLMs from FP16

181 views5 months ago

YouTubeAI Research Roundup

LLM Mastery in 30 Days | Course Introduction

3.2K viewsSep 7, 2024

YouTubeNeural Hacks with Vasanth

AGI Dreams Podcast – October 01, 2025

2 views5 months ago

YouTubeRobert Lee

SINQ: Calibration-Free Low-Bit LLM Quantization

168 views5 months ago

YouTubeAI Research Roundup

L 2 Ollama | Run LLMs locally

8.8K viewsJul 15, 2024

YouTubeCode With Aarohi

Quantizing LLMs - How & Why (8-Bit, 4-Bit, GGUF & More)

22.4K viewsNov 18, 2024

YouTubeAdam Lucek

NVIDIA GPU Quantization Support for LLMs

31 views3 months ago

YouTubeAIProgrammingHardware

LLMs Quantization Crash Course for Beginners

5.7K viewsMay 19, 2024

YouTubeAI Anytime

Scale-Aware Memory Strategies for Reasoning LLMs

15 views5 months ago

YouTubeAI Research Roundup

Ollama.ai: A Developer's Quick Start Guide!

6.3K viewsFeb 2, 2024

YouTubeAI Arcade

Building Ubuntu Server for AI and LLMs from scratch Part 2: Nvidia …

3.1K viewsJan 5, 2025

YouTubeRoboTF AI

See more videos