All Articles
AI Tools7 min read22 May 2024

Running LLMs Locally in 2024: Ollama and the Democratisation of AI

Ollama made running large language models on your own machine genuinely accessible. One command to download and run Llama 3. For data-sensitive applications, this was significant.

Local LLMsOllamaLlamaAIPrivacy

In 2024 you could run a capable language model on a reasonably modern laptop with one command. Ollama, which made running models like Llama and Mistral as easy as running Docker containers, was one of those pieces of software that you install once and then use every day.

The technical story behind local LLMs had been building for a while. Meta's release of Llama 2 in July 2023 and Llama 3 in April 2024 provided capable open-weights models. The quantisation techniques that reduced model size by 4-8x with relatively small quality loss, developed and refined by the community, made models that previously required server hardware run on consumer GPUs and even on CPU-only machines. llama.cpp, the optimised C++ inference engine, was the enabling technology for much of this.

Ollama wrapped this into a clean developer experience. One command to pull a model. An OpenAI-compatible API so you could swap between local and cloud models with a configuration change. A library of models available with simple names. The mental model was Docker for AI models and it was the right mental model.

The practical impact was felt most in environments where sending data to a cloud API was problematic. Healthcare organisations with strict data governance, legal firms handling confidential documents, financial institutions with regulatory restrictions: for these users, cloud LLMs were either prohibited or required lengthy approval processes. A local model had none of these constraints.

The quality comparison with cloud models in 2024 was honest about the gap. Llama 3 70B was impressive for an open model but GPT-4 was better on most benchmarks. For many tasks the gap was small enough not to matter. For others it was decisive. The choice was between capability and control.

The development workflow improvements were also real. Testing changes to your LLM application without API costs. Experimenting with prompts without rate limits. Running offline. These were genuine improvements to the development experience even for teams that used cloud models in production.

What the rise of local LLMs demonstrated was that AI capability was decoupling from cloud dependence. The infrastructure to run powerful AI was becoming commodity hardware. The implications for the competitive advantage of the large AI labs, whose moat was partly in the compute required to run their models, were worth watching.

Found this useful?

Share it with someone who'd enjoy it.