Local AI models with Ollama
Why local AI models?
Running AI models locally on your desktop provides several important advantages:
| Aspect | Local Models | Cloud API |
|---|---|---|
| Privacy | Your data stays on your computer | Data sent to cloud servers |
| Cost | Free after installation | Pay per API call |
| Speed | No internet latency | Depends on connection |
| Offline | Works without internet | Requires internet connection |
| Control | Full control over your data | Data handled by third parties |
Why Ollama?
Ollama is the leading platform for running open-source AI models locally. Key features:
- ✅ Easy installation and setup
- ✅ Thousands of available models
- ✅ Lightweight and fast
- ✅ Cross-platform (Windows, macOS, Linux)
- ✅ Simple model management
- ✅ OpenAI-compatible API
Installation
1. Download and install Ollama
Visit ollama.ai and download the installer for your operating system.
2. Verify installation
After installation, verify Ollama is working:
ollama --version
For Windows, restart your terminal after installation.
3. Pull a model
Download a model (example with qwen2.5):
ollama pull qwen2.5:7b
This will download the model. Depending on your internet speed, this may take a few minutes.
Model installation
Quick start
For the best balance between quality and performance, we recommend:
ollama pull qwen3-vl:4b
Recommended model: qwen3-vl:4b
This is the recommended model because it:
- ✅ Requires only 4GB of RAM
- ✅ Includes vision capabilities (see images)
- ✅ Offers good speed and quality balance
- ✅ Works well on most hardware
- ✅ Fully open and free to use
Install other models
You can install additional models:
# Other popular models
ollama pull llama2:7b # Excellent all-purpose model
ollama pull mistral:7b # Fast and capable
ollama pull neural-chat:7b # Great for conversations
Hardware recommendations
Ollama works on various hardware. Here's what you need for different models:
| Model Size | RAM needed | Graphics Card | Performance |
|---|---|---|---|
| 3-4B | 4GB minimum | Not required | Fast (5-10 tokens/sec) |
| 7B | 8GB recommended | Optional (faster) | Good (2-5 tokens/sec) |
| 13B+ | 16GB+ recommended | GPU strongly recommended | Slower without GPU |
GPU acceleration: If you have an NVIDIA GPU, Ollama will automatically use it for faster inference.
Configuration in the desktop app
After installing Ollama and models:
- Open the AI-School Desktop application
- Go to Settings → Local Models
- Check that Ollama is detected
- Select your model from the dropdown
- You're ready to use local AI!
Available models
Popular models available via Ollama:
Vision models (can see images)
- qwen3-vl:4b (recommended) - Fast vision model
- llama2-vision:13b - More powerful vision model
- minicpm-v:latest - Compact vision model
Text models
- qwen2.5:7b - Excellent for all tasks
- llama2:7b - Classic, well-tested
- mistral:7b - Fast and efficient
- neural-chat:7b - Conversational focus
- openchat:7b - Good all-rounder
Specialized models
- codegemma:7b - For programming tasks
- sqlcoder:7b - SQL database queries
- dolphin-mixtral:8x7b - Powerful mixture model
Start with qwen3-vl:4b and explore other models based on your needs!