[How To] Install and Configure Ollama for Local AI Inference on Ubuntu

Ollama is a powerful, open-source framework designed to simplify the process of running large language models (LLMs) and other generative AI models directly on your local machine. By bringing AI inference capabilities to your desktop, Ollama offers enhanced privacy, cost-effectiveness, and greater control over your AI applications. This guide will walk you through the steps to install and configure Ollama on Ubuntu, enabling you to harness the power of local AI inference.

System Requirements

Before you begin the Ollama installation, ensure your Ubuntu system meets the recommended hardware specifications for optimal performance, especially when running larger language models. While Ollama can run in CPU-only mode, a dedicated GPU significantly enhances inference speed.

  • Operating System: Ubuntu LTS (Long Term Support) version 22.04 or higher.
  • RAM:
    • 8GB for 3B models.
    • 16GB for 7B models.
    • 32GB for 13B+ models.
  • Storage: At least 10GB of free disk space; some models can require 20GB or more.
  • GPU (Optional but Recommended): NVIDIA RTX 3060 or better for accelerated inference. Ensure that appropriate NVIDIA drivers and the NVIDIA Container Toolkit are installed and configured *before* installing Ollama if you plan to use GPU acceleration.

Install Ollama on Ubuntu

Update System Packages

It’s always a good practice to update your system’s package list and upgrade existing packages to their latest versions before installing new software. This ensures you have the most recent security patches and dependencies.

lc-root@ubuntu:~$ sudo apt update && sudo apt upgrade -y

Using the Official Curl Script

The recommended method for installing Ollama on Ubuntu is by using the official installation script. This script handles the setup of Ollama as a systemd service, allowing it to start automatically.

lc-root@ubuntu:~$ curl -fsSL https://ollama.com/install.sh | sh

This command will download and execute the installation script. It will install the Ollama binary to /usr/local/bin, create a dedicated ollama user and group, set up a systemd service for automatic startup, and configure the model storage directory.

Verify Ollama Installation

After the installation script completes, you can verify that Ollama has been successfully installed by checking its version:

lc-root@ubuntu:~$ ollama -v

A successful output will display the installed Ollama version number, confirming that the installation was successful.

Basic Ollama Usage

With Ollama installed, you can now start interacting with local large language models.

Pulling AI Models

Ollama simplifies the process of downloading various LLMs from its library. To download a model, use the ollama pull command. For instance, to get the popular Llama 2 model:

lc-root@ubuntu:~$ ollama pull llama2

You can find a comprehensive list of available models and their sizes on the Ollama official website.

Running AI Models

Once a model is downloaded, you can run it and begin an interactive chat session using the ollama run command:

lc-root@ubuntu:~$ ollama run llama2

This will open an interactive prompt. You can type your queries and the LLM will respond. To exit the interactive session, type /bye or press Ctrl+D.

Listing Installed Models

To view all the AI models you have downloaded and currently have available on your system, use the following command:

lc-root@ubuntu:~$ ollama list

This command provides a convenient overview of your local models, including their names and sizes.

Starting the Ollama Service

Typically, the Ollama service starts automatically after installation and system boot. However, if you need to manually start the Ollama API server (which listens on http://localhost:11434 by default), you can use:

lc-root@ubuntu:~$ ollama serve

This command will keep the Ollama server running in the foreground. For background operation, the systemd service is usually sufficient.

Configuring Ollama for Network Access (Optional)

By default, Ollama is configured to listen only on the local interface (127.0.0.1), meaning it can only be accessed from the machine it’s running on. If you wish to access Ollama from other devices on your local network or integrate it with a web-based user interface (like Open WebUI), you will need to configure it to listen on all network interfaces.

Modify the Ollama Systemd Service

To enable network access, you need to modify the systemd service file for Ollama:

lc-root@ubuntu:~$ sudo systemctl edit ollama.service

This command will open a new editor (usually nano or vim) to create or edit an override file for the Ollama systemd service. Add the following lines under the [Service] section:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"

This setting configures Ollama to listen on all available network interfaces. If you need to specify a different port, you can use Environment="OLLAMA_HOST=0.0.0.0:YOUR_PORT". Save and close the editor.

Apply Changes and Restart Ollama

After modifying the systemd service file, you must reload the systemd daemon and restart the Ollama service for the changes to take effect:

lc-root@ubuntu:~$ sudo systemctl daemon-reload
lc-root@ubuntu:~$ sudo systemctl restart ollama

Verify Network Access

To confirm that Ollama is now listening on all network interfaces, you can use the ss command:

lc-root@ubuntu:~$ ss -antp | grep 11434

The output should show an entry like *:11434 or 0.0.0.0:11434, indicating that Ollama is listening on all interfaces on port 11434.

Security Considerations

Important: Exposing Ollama to your network without authentication introduces security risks. It is highly recommended to restrict access using network Access Control Lists (ACLs) or a firewall. Avoid exposing Ollama directly to the internet unless proper security measures are in place.

GPU Acceleration

Ollama is designed to automatically detect and utilize GPU acceleration if a compatible GPU (e.g., NVIDIA) is present and correctly configured on your system. For optimal performance, especially with larger models, GPU acceleration is highly recommended.

  • NVIDIA Drivers: Ensure your NVIDIA GPU drivers are up to date. You can often install or update them using Ubuntu’s driver utility:
    lc-root@ubuntu:~$ sudo ubuntu-drivers autoinstall
  • NVIDIA Container Toolkit: Ollama utilizes containerization internally. To allow these containers to access your NVIDIA GPU, you may need to install the NVIDIA Container Toolkit. Refer to the official NVIDIA Container Toolkit Installation Guide for detailed instructions.

Ollama is efficient in managing GPU resources and will automatically release GPU memory when models are not actively in use, optimizing resource utilization.

Conclusion

You have successfully installed and configured Ollama on your Ubuntu system, setting up a robust environment for local AI inference. You can now download and interact with a variety of powerful large language models directly from your machine, leveraging the benefits of privacy, control, and efficiency. Explore the Ollama model library to discover more models and continue your journey into local AI.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.