2026-02-12
I've been experimenting with different types and sizes of LLMs solving different problems. It's fascinating how models can perform so differently on the same task.
I started some basic benchmarking, which you can try out at apps.databloom.net/ai_benchmark. Right now these are all chat-tuned models via API — I may add some others over time.
As always, all code is on GitHub.