From Idea to Production: How to Deploy Your First LLM App with a Full CI/CD Pipeline

Deployment Workflow

Why This Guide Matters

Every week, developers ask me: “How do I turn this AI prototype into a real-world application?” Many have working demos in Jupyter notebooks or Hugging Face Spaces but struggle to deploy them as scalable services. This guide bridges that gap using a real-world example: a FastAPI-based image generator powered by Replicate’s Flux model. Follow along to learn how professionals ship AI applications from local code to production.


Core Functionality Explained

In a Nutshell

User submits a text prompt → FastAPI processes the request → Calls Replicate’s image generation API → Returns the generated image.

Endpoint: /generate-image
Request Format: JSON payload with a prompt field.

Service Flow Diagram

Local Testing:

git clone https://github.com/JesseQin123/fastapi-cicd.git
pip install -r requirements.txt
python app/main.py

Why Automate Your Deployment?

If you can already test locally with curl, why bother with Docker, GitHub Actions, or Kubernetes? Here’s why:

4 Key Benefits

  1. Environment Consistency
    Eliminate “it works on my machine” issues. Docker ensures identical environments across development, testing, and production.

  2. Safety Nets
    Automated workflows prevent human errors like accidental git push --force on critical branches.

  3. Auto-Scaling
    Handle traffic spikes effortlessly. Kubernetes spins up new pods when your app trends on Hacker News.

  4. Instant Rollbacks
    Bad deployment? Argo CD reverts to stable versions faster than you can say “downtime.”


Architecture Overview

System Architecture

How It Works

  1. Code Hosting: GitHub repository stores the source code.
  2. CI Pipeline: GitHub Actions builds and tests Docker images.
  3. Image Registry: Push images to Docker Hub for storage.
  4. GitOps Deployment: Argo CD monitors Kubernetes manifests in Git.
  5. Cluster Management: Kubernetes orchestrates pods and services.
  6. Public Access: LoadBalancer provides a stable public IP.

Tech Stack Breakdown

Tech Stack
Component Role Alternatives
FastAPI High-performance API builder Flask, Django
Replicate Hosted AI inference AWS SageMaker
Docker Containerization Podman
GitHub Actions CI/CD automation GitLab CI, Jenkins
Kubernetes Container orchestration Docker Swarm
Argo CD GitOps deployment FluxCD

Step-by-Step Deployment Guide

1. Set Up Local Development

  1. Create a Python virtual environment:

    python -m venv venv
    source venv/bin/activate
    
  2. Install dependencies:

    pip install fastapi uvicorn replicate python-dotenv
    
  3. Core API code:

    # app/main.py
    from fastapi import FastAPI
    import replicate
    import os
    
    app = FastAPI()
    os.environ.get("REPLICATE_API_TOKEN")
    
    @app.post("/generate-image")
    async def generate_image(prompt: str):
        output = replicate.run(
            "stability-ai/stable-diffusion:...",
            input={"prompt": prompt}
        )
        return {"image_url": output[0]}
    

Security Tip:
Always add .env to .gitignore to avoid exposing API keys.


2. Dockerize the Application

Dockerfile:

FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY ./app /app
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and Run:

docker build -t yourusername/fastapi-flux:latest .
docker run -p 8000:8000 -e REPLICATE_API_TOKEN=your_token yourimage

3. Configure GitHub Actions CI

.github/workflows/ci.yml:

name: CI Pipeline
on: [push]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: 3.10
    
    - name: Install dependencies
      run: pip install -r requirements.txt
    
    - name: Build Docker image
      run: docker build -t yourusername/fastapi-flux:${{ github.sha }} .
    
    - name: Push to Docker Hub
      uses: docker/login-action@v2
      with:
        username: ${{ secrets.DOCKER_USERNAME }}
        password: ${{ secrets.DOCKER_PASSWORD }}
      run: docker push yourusername/fastapi-flux:${{ github.sha }}

4. Kubernetes Deployment

k8s/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastapi-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: fastapi
  template:
    metadata:
      labels:
        app: fastapi
    spec:
      containers:
      - name: fastapi
        image: yourusername/fastapi-flux:latest
        ports:
        - containerPort: 8000
        envFrom:
        - secretRef:
            name: replicate-secret

k8s/service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: fastapi-service
spec:
  type: LoadBalancer
  ports:
  - port: 8000
    targetPort: 8000
  selector:
    app: fastapi

5. Argo CD for GitOps Automation

Setup:

  1. Install Argo CD via Helm.
  2. Create an Application resource pointing to your Git repo’s k8s/ directory.
  3. Enable auto-sync for seamless deployments.

Key Features:

  • Visualize deployment status
  • One-click rollbacks
  • Automatic configuration drift correction
Argo CD Dashboard

Security Best Practices

  1. Secrets Management

    • Use GitHub Secrets for CI credentials.
    • Store runtime secrets in Kubernetes Secrets or AWS Secrets Manager.
  2. Image Scanning
    Enable Trivy or Dependabot to detect vulnerabilities.

  3. Health Checks
    Add liveness and readiness probes:

    livenessProbe:
      httpGet:
        path: /health
        port: 8000
      initialDelaySeconds: 5
      periodSeconds: 10
    
  4. Monitoring
    Track key metrics with Prometheus + Grafana:

    • Request success rate
    • Latency
    • Resource utilization
  5. Environment Isolation
    Use Kubernetes namespaces for dev/staging/prod separation.


Next Steps for Optimization

  1. Custom Models
    Fine-tune models using LoRA or RLHF.

  2. API Security
    Add JWT authentication or rate limiting.

  3. Caching
    Cache generated images in Redis or Cloudflare R2.

  4. Frontend Integration
    Build a React/Vue dashboard to manage prompts.

  5. Advanced Traffic Routing
    Implement canary deployments with Istio.


Troubleshooting Common Issues

Issue Likely Cause Solution
Image build fails Dependency conflicts Pin versions in requirements.txt
Pods in CrashLoop Missing secrets Verify Secret mounts in deployment
Timeout errors Slow model inference Adjust timeout settings in FastAPI
Image pull errors Registry permissions Configure imagePullSecrets
Argo CD out of sync Network policies Check cluster firewall rules

Recommended Resources

  1. Kubernetes Official Docs
  2. Argo CD Best Practices
  3. FastAPI Security Guide
  4. GitHub Actions Examples

Ready to deploy? Clone the sample repository, follow the steps, and share a screenshot when your first pod goes green! 🚀