Local AI models with Ollama

Why local AI models?

Running AI models locally on your desktop provides several important advantages:

Aspect	Local Models	Cloud API
Privacy	Your data stays on your computer	Data sent to cloud servers
Cost	Free after installation	Pay per API call
Speed	No internet latency	Depends on connection
Offline	Works without internet	Requires internet connection
Control	Full control over your data	Data handled by third parties

Why Ollama?

Ollama is the leading platform for running open-source AI models locally. Key features:

✅ Easy installation and setup
✅ Thousands of available models
✅ Lightweight and fast
✅ Cross-platform (Windows, macOS, Linux)
✅ Simple model management
✅ OpenAI-compatible API

Installation

1. Download and install Ollama

Visit ollama.ai and download the installer for your operating system.

2. Verify installation

After installation, verify Ollama is working:

ollama --version

For Windows, restart your terminal after installation.

3. Pull a model

Download a model (example with qwen2.5):

ollama pull qwen2.5:7b

This will download the model. Depending on your internet speed, this may take a few minutes.

Model installation

Quick start

For the best balance between quality and performance, we recommend:

ollama pull qwen3-vl:4b

Recommended model: qwen3-vl:4b

This is the recommended model because it:

✅ Requires only 4GB of RAM
✅ Includes vision capabilities (see images)
✅ Offers good speed and quality balance
✅ Works well on most hardware
✅ Fully open and free to use

Install other models

You can install additional models:

# Other popular models
ollama pull llama2:7b       # Excellent all-purpose model
ollama pull mistral:7b      # Fast and capable
ollama pull neural-chat:7b  # Great for conversations

Hardware recommendations

Ollama works on various hardware. Here's what you need for different models:

Model Size	RAM needed	Graphics Card	Performance
3-4B	4GB minimum	Not required	Fast (5-10 tokens/sec)
7B	8GB recommended	Optional (faster)	Good (2-5 tokens/sec)
13B+	16GB+ recommended	GPU strongly recommended	Slower without GPU

GPU acceleration: If you have an NVIDIA GPU, Ollama will automatically use it for faster inference.

Configuration in the desktop app

After installing Ollama and models:

Open the AI-School Desktop application
Go to Settings → Local Models
Check that Ollama is detected
Select your model from the dropdown
You're ready to use local AI!

Available models

Popular models available via Ollama:

Vision models (can see images)

qwen3-vl:4b (recommended) - Fast vision model
llama2-vision:13b - More powerful vision model
minicpm-v:latest - Compact vision model

Text models

qwen2.5:7b - Excellent for all tasks
llama2:7b - Classic, well-tested
mistral:7b - Fast and efficient
neural-chat:7b - Conversational focus
openchat:7b - Good all-rounder

Specialized models

codegemma:7b - For programming tasks
sqlcoder:7b - SQL database queries
dolphin-mixtral:8x7b - Powerful mixture model

Start with qwen3-vl:4b and explore other models based on your needs!

Why local AI models?​

Why Ollama?​

Installation​

1. Download and install Ollama​

2. Verify installation​

3. Pull a model​

Model installation​

Quick start​

Recommended model: qwen3-vl:4b​

Install other models​

Hardware recommendations​

Configuration in the desktop app​

Available models​

Vision models (can see images)​

Text models​

Specialized models​