[How To] Hugging Face Linux Deployment: Deploy Transformers

lc-root | February 24, 2026 | AI, How To | No Comments

The hugging face linux deployment process allows developers to serve powerful AI models for real-world applications. By leveraging tools like FastAPI and Docker, you can create a scalable and reproducible environment to deploy Hugging Face Transformer models on a Linux server. This guide provides a comprehensive walkthrough of setting up a production-ready inference service from scratch.

Furthermore, this tutorial will cover everything from setting up a local Python environment to containerizing the application with Docker and deploying it on a fresh Ubuntu server. We will, therefore, build a simple API endpoint that takes a text prompt and returns a generated sequence from a pre-trained model.

Prerequisites
Step 1: Setting Up the Local Python Environment
Step 2: Creating the FastAPI Application
Step 3: Dockerizing the Application
Step 4: Deploying on a Linux Server
Conclusion
Next Steps

Prerequisites for Hugging Face Linux Deployment

Before you begin, ensure you have the following:

A Linux server running a recent version of Ubuntu (this guide uses Ubuntu 24.04 LTS).
sudo or root privileges on the server.
Basic familiarity with the Linux command line and Python programming.
Docker installed on your server.
An understanding of what a container is. For more details, see our beginner’s guide to Linux containers.

Step 1: Preparing Your Python Environment for Hugging Face Linux Deployment

To begin, first connect to your Linux server and update its package repositories. It’s always a good practice, therefore, to start with an up-to-date system.

lc-root@ubuntu:~$ sudo apt update && sudo apt upgrade -y

Subsequently, install Python and its virtual environment manager, venv.

lc-root@ubuntu:~$ sudo apt install -y python3-pip python3-venv

Thus, create a project directory for your application and create a virtual environment within it. This isolates your project’s dependencies from the system’s Python packages.

lc-root@ubuntu:~$ mkdir hf_deployment
lc-root@ubuntu:~$ cd hf_deployment
lc-root@ubuntu:~$ python3 -m venv .venv

Next, activate the virtual environment:

lc-root@ubuntu:~$ source .venv/bin/activate
(.venv) lc-root@ubuntu:~$

Consequently, your shell prompt should now be prefixed with (.venv), indicating that the virtual environment is active.

Step 2: Creating the FastAPI Application

With the environment now ready, you can proceed to install the necessary Python libraries: fastapi for the API, uvicorn as the server, and transformers with torch for the AI model, respectively.

(.venv) lc-root@ubuntu:~$ pip install fastapi uvicorn transformers torch

Consequently, create a file named main.py to house the API logic. This code will load a pre-trained model and, moreover, create an endpoint to handle inference requests.

# main.py
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline

# Initialize FastAPI app
app = FastAPI(
    title="Hugging Face Inference API",
    description="An API for text generation using a Hugging Face model.",
    version="1.0"
)

# Load the text-generation pipeline
# distilgpt2 is a smaller, faster version of GPT-2
try:
    generator = pipeline("text-generation", model="distilgpt2")
except Exception as e:
    generator = None
    print(f"Failed to load model: {e}")

# Define request and response models for type validation
class GenerationRequest(BaseModel):
    prompt: str
    max_length: int = 40

class GenerationResponse(BaseModel):
    generated_text: str

@app.get("/")
def read_root():
    return {"status": "API is running. Visit /docs for details."}

@app.post("/generate", response_model=GenerationResponse)
def generate_text(request: GenerationRequest):
    if generator is None:
        return {"generated_text": "Model is not available."}
    
    result = generator(request.prompt, max_length=request.max_length)
    return {"generated_text": result[0]['generated_text']}

This script sets up a /generate endpoint that accepts a prompt and returns the model’s output. For more background on creating models, consider reading about how to build your first AI model on Linux.

Step 3: Dockerizing the Application

Docker allows you to package your application and its dependencies into a single, portable container. Create a Dockerfile in your project directory:

# Dockerfile
# Use an official Python runtime as a parent image
FROM python:3.10-slim

# Set the working directory in the container
WORKDIR /app

# Copy the dependencies file to the working directory
COPY requirements.txt .

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir --trusted-host pypi.python.org -r requirements.txt

# Copy the content of the local src directory to the working directory
COPY main.py .

# Expose the port the app runs on
EXPOSE 8000

# Run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

In addition, you will need a requirements.txt file to list the Python dependencies for the Docker build.

(.venv) lc-root@ubuntu:~$ pip freeze > requirements.txt

Therefore, build the Docker image. This command tells Docker to build an image from the Dockerfile in the current directory and tag it as hf-api-server.

lc-root@ubuntu:~$ docker build -t hf-api-server .

Moreover, this process might take some time, as Docker will download the base image, install dependencies, and also download the Hugging Face model into a layer of the image.

Step 4: Completing Your Hugging Face Linux Deployment

Subsequently, once the image is built, you can run it as a container. The following command, in particular, starts the container in detached mode and maps port 80 on the host to port 8000 in the container.

lc-root@ubuntu:~$ docker run -d -p 80:8000 --name hf-inference-api hf-api-server

You can verify that the container is running with:

lc-root@ubuntu:~$ docker ps

In order to test your API, use curl to send a request to the /generate endpoint from your server’s terminal:

lc-root@ubuntu:~$ curl -X POST "http://127.0.0.1/generate" -H "Content-Type: application/json" -d '{"prompt": "The future of AI is"}'

You should receive a JSON response with the generated text, demonstrating a successful hugging face linux deployment. Other local inference tools can also be interesting to explore, such as installing Ollama for local AI inference.

Conclusion

You have successfully deployed a Hugging Face Transformers model as a web service on a Linux server. By containerizing the application with Docker, you’ve created a portable and scalable service that can be easily managed and deployed across different environments. This setup provides a solid foundation for building more complex AI-powered applications.

Next Steps

From here, you can explore several enhancements:

GPU Acceleration: For better performance, deploy on a GPU-enabled server and use a Docker image with CUDA support.
Scalability: Use a container orchestrator like Kubernetes to manage multiple instances of your API for high availability and load balancing.
Security: Implement authentication and rate limiting to protect your API from unauthorized access.
Choosing a Distro: For specialized AI workloads, you may want to evaluate different Linux distributions for AI and Machine Learning.

Outbound Links

Tags:Docker, FastAPI, Hugging Face, Linux, Machine Learning, Model Serving, Transformers

Linux/Unix Configuration

[How To] Hugging Face Linux Deployment: Deploy Transformers

Table of Contents

Prerequisites for Hugging Face Linux Deployment

Step 1: Preparing Your Python Environment for Hugging Face Linux Deployment

Step 2: Creating the FastAPI Application

Step 3: Dockerizing the Application

Step 4: Completing Your Hugging Face Linux Deployment

Conclusion

Next Steps

Outbound Links

About The Author

lc-root

Leave a ReplyCancel reply

Table of Contents

Prerequisites for Hugging Face Linux Deployment

Step 1: Preparing Your Python Environment for Hugging Face Linux Deployment

Step 2: Creating the FastAPI Application

Step 3: Dockerizing the Application

Step 4: Completing Your Hugging Face Linux Deployment

Conclusion

Next Steps

Outbound Links

Related Posts

About The Author

lc-root

Leave a ReplyCancel reply