What is Shimmy?
Shimmy is an ultra-lightweight tool weighing only 5.1MB that provides fully OpenAI-compatible AI model services on your local computer. This means you can use existing AI tools and applications by simply pointing their API endpoints to Shimmy, enabling you to run large language models locally and privately without any code changes.
Unlike other solutions that require substantial resources and complex configurations, Shimmy features a minimalist design with startup times under 100 milliseconds and memory usage of approximately 50MB. It automatically discovers GGUF model files in your system and provides complete OpenAI-compatible endpoints, allowing various AI tools to work seamlessly.
Why Choose Shimmy?
As artificial intelligence applications become increasingly prevalent, many developers and organizations want to run models locally rather than relying on cloud API services. This approach offers multiple benefits:
-
Data Privacy: Your code and data never leave your local machine -
Cost Control: No need to pay per-token API fees -
Response Speed: Local inference provides sub-second response times -
Reliability: No rate limits or service downtime
However, traditional local model deployment solutions often require massive software packages (like Ollama’s 680MB binary) and complex configuration processes. Shimmy addresses these issues by providing an extremely lightweight, zero-configuration solution.
Key Features
Complete OpenAI API Compatibility
Shimmy provides 100% OpenAI-compatible interfaces, including:
-
GET /v1/models
– List available models -
POST /v1/chat/completions
– Chat completion functionality -
POST /v1/completions
– Text completion functionality -
GET /health
– Health check endpoint
This means any tool supporting OpenAI API can work directly with Shimmy without modifications.
Automatic Model Discovery
Shimmy automatically discovers model files from multiple locations:
-
Hugging Face cache directory: ~/.cache/huggingface/hub/
-
Ollama model directory: ~/.ollama/models/
-
Local directory: ./models/
-
Environment variable specified path: SHIMMY_BASE_GGUF=path/to/model.gguf
Zero-Configuration Design
-
Automatic port allocation to avoid conflicts -
Automatic LoRA adapter detection for specialized model support -
No configuration files or setup wizards required
Cross-Platform Support
Shimmy supports Windows, macOS, and Linux systems. On macOS, it also supports Metal GPU acceleration for both Intel and Apple Silicon chips.
Installation and Usage
Quick Installation
Windows Systems:
Precompiled binaries are recommended to avoid build dependency issues:
curl -L https://github.com/Michael-A-Kuykendall/shimmy/releases/latest/download/shimmy.exe -o shimmy.exe
Alternatively, install from source (requires LLVM/Clang first):
winget install LLVM.LLVM
cargo install shimmy --features huggingface
macOS and Linux Systems:
cargo install shimmy --features huggingface
Obtaining Models
Shimmy uses GGUF format model files. You can download models using huggingface-cli:
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf --local-dir ./models/
huggingface-cli download bartowski/Llama-3.2-1B-Instruct-GGUF --local-dir ./models/
Starting the Service
Starting the Shimmy server is straightforward:
# Automatic port allocation
shimmy serve
# Or manually specify port
shimmy serve --bind 127.0.0.1:11435
Once started, simply point your AI tools to the displayed port to begin using them.
Docker Deployment
For users who prefer containerized solutions, Shimmy also offers Docker support.
Quick Start
-
Create a models directory:
mkdir models
-
Download model files:
curl -L "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf" -o models/phi-3-mini.gguf
-
Start Shimmy:
docker-compose up -d
-
Test the API:
curl http://localhost:11434/v1/models
Environment Configuration
Docker deployment supports the following environment variables:
Variable | Default Value | Description |
---|---|---|
SHIMMY_PORT |
11434 |
Server port |
SHIMMY_HOST |
0.0.0.0 |
Listening address |
SHIMMY_BASE_GGUF |
/app/models |
Models directory |
GPU Support
For NVIDIA GPU users, Docker deployment supports automatic GPU configuration. Ensure:
-
NVIDIA Container Toolkit is installed -
Using Docker Compose v2.3+ with GPU support
Integration Examples
VSCode Copilot
Add to VSCode settings:
{
"github.copilot.advanced": {
"serverUrl": "http://localhost:11435"
}
}
Continue.dev
Add to Continue.dev configuration:
{
"models": [{
"title": "Local Shimmy",
"provider": "openai",
"model": "your-model-name",
"apiBase": "http://localhost:11435/v1"
}]
}
Cursor IDE
Works out of the box – simply point to http://localhost:11435/v1
Performance Comparison
Here’s how Shimmy compares to other popular tools:
Tool | Binary Size | Startup Time | Memory Usage | OpenAI API Compatibility |
---|---|---|---|---|
Shimmy | 5.1MB | <100ms | 50MB | 100% |
Ollama | 680MB | 5-10 seconds | 200MB+ | Partial |
llama.cpp | 89MB | 1-2 seconds | 100MB | None |
Technical Architecture
Shimmy is built on the following technologies:
-
Rust + Tokio: Provides memory safety and asynchronous high performance -
llama.cpp backend: Industry-standard GGUF inference engine -
OpenAI API compatibility: Complete replacement solution -
Dynamic port management: Zero conflicts, automatic allocation -
Zero-configuration auto-discovery: Works out of the box
Developer Certificate of Origin (DCO)
Shimmy uses the Developer Certificate of Origin (DCO) to ensure all contributions are properly licensed and that contributors have the right to submit their code.
What is DCO?
DCO is a lightweight way for contributors to certify that they wrote or otherwise have the right to submit their contribution. It’s an industry-standard alternative to Contributor License Agreements (CLAs) used by projects like Linux kernel, Docker, and GitLab.
How to Sign Your Commits
Option 1: Automatic Sign-off (Recommended)
Configure git to automatically sign off your commits:
git config user.name "Your Name"
git config user.email "your.email@example.com"
git config format.signoff true
Then commit normally:
git commit -m "Add new feature"
Option 2: Manual Sign-off
Use the -s
flag for git commit:
git commit -s -m "Add new feature"
This adds a sign-off line to your commit message:
Add new feature
Signed-off-by: Your Name <your.email@example.com>
Option 3: Amend Existing Commits
If you forgot to sign off a commit:
# For the last commit
git commit --amend --signoff
# For multiple commits
git rebase --signoff HEAD~3 # Last 3 commits
Frequently Asked Questions
Is Shimmy free?
Yes, Shimmy will always be free. It uses the MIT license and promises never to become a paid product.
Can I use it if I work for a company?
Yes. Use your work email and ensure your employer allows you to contribute to open source projects.
What if I already contributed code without signing?
No problem! Simply amend your commits to add sign-off and force push the branch.
Is this legally binding?
Yes, the DCO is a legal certification that you have the right to contribute your code.
What about code I copied from elsewhere?
Only contribute code you wrote or have proper licensing rights to. When in doubt, ask the maintainer.
Can I sign someone else’s commit?
No, each contributor must sign off their own commits with their own identity.
What model formats are supported?
Shimmy supports GGUF format model files, which is currently one of the most popular quantized model formats.
Is GPU acceleration supported?
Yes, Shimmy supports GPU acceleration. On macOS, it supports Metal GPU acceleration, and on Linux and Windows, it supports CUDA.
How do I update Shimmy?
If installed via cargo, use cargo install shimmy --features huggingface --force
to update. If using precompiled binaries, simply download the latest version and replace.
Technical Support and Community
-
Issue reporting: Submit via GitHub Issues -
Discussions: Participate in GitHub Discussions -
Documentation: Check the docs/ directory for detailed documentation -
Sponsorship: Support project development through GitHub Sponsors
Quality and Reliability
Shimmy maintains high code quality through comprehensive testing:
-
Comprehensive test suite with property-based testing -
Automated CI/CD pipeline with quality gates -
Runtime invariant checking for critical operations -
Cross-platform compatibility testing
Conclusion
Shimmy is a revolutionary lightweight tool that makes local AI model deployment simple and efficient. Whether you’re a developer, researcher, or enterprise user, you can benefit from its clean design, powerful features, and zero-configuration experience.
Due to its minimal resource footprint and fully compatible API design, Shimmy is an ideal alternative to bulky and complex model service solutions. Most importantly, its commitment to being forever free and open-source provides a solid foundation for widespread adoption.
If you’re looking for a simple, efficient, and reliable local AI model service solution, Shimmy is worth trying.