M1 Pro vs M4 Max
New work laptop. So of course I had to benchmark its speed at running local LLMs.. These results are the using the default 4 bit quantization, with ollama version 0.4.1. Apple Macbook Pro M1 Pro (32GB RAM) (2021 model) gemma2:9b: eval rate: 24.17 tokens/s gemma2:27b: eval rate: 10.06 tokens/s llama3.2:3b: eval rate: 52.10 tokens/s llama3.1:8b: eval rate: 31.69 tokens/s Apple Macbook Pro M4 Max (36GB RAM) (2024 model) gemma2:9b: eval rate: 46.49 tokens/s gemma2:27b: eval rate: 20.06 tokens/s llama3.2:3b: eval rate: 99.66 tokens/s llama3.1:8b: eval rate: 59.98 tokens/s Conclusions The 2024 laptop roughly twice as fast as the 2021 one, and almost exactly the speed of RTX 3080 (3 years old nvidia GPU) with more VRAM to play with, so quite nice. Still, cloud providers are order of magnitude faster. ...