Octo: A Practical Guide to the Multi-Model Coding Assistant
What this guide is for
This article translates and reshapes the project files you provided into a single, practical English guide. It stays strictly within the material in those files and preserves technical details and examples exactly as given. You’ll find clear instructions to install and run Octo, explanations of its built-in behaviors, configuration examples, recommended files and formats, and a practical list of remaining work items taken from the project TODO. The tone is conversational and direct so a reader with a junior-college level technical background can follow along and use Octo in real projects.
Quick summary — what Octo does
-
Octo is a small, helpful, cephalopod-branded coding assistant you run locally or against any OpenAI-compatible or Anthropic-compatible LLM API. -
It helps you manage multi-model workflows: you can switch models mid-conversation if one model gets stuck. -
It can optionally use a pair of small autofix models (open-source) to recover from tool-call or patch failures. -
Octo manages “thinking tokens” carefully for thinking models, to help keep multi-turn reasoning effective. -
The project contains source code you can build (it may take up to ~20 seconds using npx tsc
). -
Octo has zero telemetry: it does not send usage data out by default.
These points are taken directly from the project README and related files.
Who Octo is for
Octo is aimed at developers who want a lightweight command-line coding assistant that:
-
Works with many LLMs (OpenAI-compatible, Anthropic, or local models), -
Lets you switch models without losing conversation state, -
Helps with tool calls and code edits by attempting automatic fixes, -
Integrates structured context from other systems via a Model Context Protocol (MCP).
If you write code and want a friendly companion that plays well with multiple models and local tooling, Octo is designed for that workflow.
Install and start — copyable steps
Install globally via npm
npm install --global octofriend
Start Octo
octofriend
A quick verification (example)
octofriend --version
# example output shown in project:
# octofriend 0.1.0
When you run octofriend
, it will open an interactive command-line assistant. The README includes a short demo and an asciinema recording showing the experience.
Key features and practical notes
Below are the features and behaviors described in the source files, rewritten in plain language.
1. Multi-model support and live switching
Octo works with any LLM API that is compatible with OpenAI or Anthropic. The README explicitly lists that Octo works well with models such as GPT-5, Claude 4, GLM-4.5, and Kimi K2 — but you can use any compatible model and switch between them during a conversation.
Practical use: start a session with a fast model for quick tasks, and switch to a deeper reasoning model when you need more complex output, without losing the conversation history.
2. Autofix models for tool and patch failures
Octo can optionally use custom, open-source autofix models to handle failed tool calls or mis-formatted patches. The README recommends two such models the project has trained and published:
-
diff-apply
— for helping to apply code diffs/patches correctly. -
fix-json
— for repairing broken JSON outputs from tools or models.
When a tool call or patch application fails, Octo can hand the error and the failed output to one of these autofix models and retry the operation automatically.
3. Thought-token management (“thinking tokens”)
The README states Octo carefully manages thinking tokens for thinking models (models that have an explicit “thinking” stage). That means Octo aims to allocate the right token budget for multi-turn reasoning so the model can think as deeply as needed without wasting tokens.
This is an internal behavior described by the project and intended to improve stability with models that expose or depend on a thinking budget.
4. Zero telemetry, privacy friendly
Octo is designed with privacy in mind: the README states it has zero telemetry. If you pair Octo with a privacy-focused LLM provider or run models locally, your code and data remain under your control.
5. MCP (Model Context Protocol) integration
Octo can connect to MCP servers to receive structured contextual data. The README gives an example showing how to configure an MCP server in the Octo configuration file; this is useful if you want Octo to access data from project management tools or other services at runtime.
Important project files and rules
Octo reads “rule” files to know how to behave in different projects or environments. These are plain-text files that define preferences and behavior.
Files Octo looks for (in order):
-
OCTO.md
-
CLAUDE.md
-
AGENTS.md
How Octo chooses rules:
-
Octo searches the current directory and up through parent directories until the user’s home directory. -
All rule files found are merged. -
Octo uses the first matching file among OCTO.md
,CLAUDE.md
, andAGENTS.md
if more than one of those names is present — the first encountered file is used as the primary instruction set.
Global configuration location (example)
Put global rules in:
~/.config/octofriend/OCTO.md
This lets you keep project-specific rules separate from global defaults.
Configuration snapshot — MCP example
After first run Octo creates a configuration file at:
~/.config/octofriend/octofriend.json5
A sample mcpServers
configuration (from the README) looks like this:
mcpServers: {
serverName: {
command: "command-string",
arguments: [
"arguments",
"to",
"pass",
],
},
},
A concrete example the README shows (connect Linear via an npx
wrapper):
mcpServers: {
linear: {
command: "npx",
arguments: [ "-y", "mcp-remote", "https://mcp.linear.app/sse" ],
},
},
Use that configuration to let Octo fetch structured data from external services (issues, tasks, or other context) at runtime rather than having to paste context into the session manually.
Building the project (developer note)
One of the files (OCTO.md
) contains developer guidance that is included in the project source. The project can be built with TypeScript and the project notes say:
-
The project can take up to 20 seconds to build via:
npx tsc
-
The style preference in the codebase suggests using type Blah = { ... }
instead ofinterface Blah { ... }
unless an interface is needed for classes — a code style preference present in the source.
These items are relevant if you plan to read or modify Octo’s source and rebuild it locally.
Demo and visual reference
The README includes an embedded demo (asciinema) which demonstrates an example Octo session. The repo provides a demo SVG that can be used as a quick visual:
This is not a live screenshot of your environment, but it shows the flow of starting Octo and entering a prompt.
Example interactions and commands
Below are the main commands and short examples, drawn from the README and project examples:
-
Start Octo:
octofriend
-
Check version:
octofriend --version
-
Configure MCP servers (edit
~/.config/octofriend/octofriend.json5
, see example above) -
How Octo looks for rules:
-
Place OCTO.md
at project root for project-specific behavior -
Place ~/.config/octofriend/OCTO.md
for global defaults
-
The README indicates that Octo will print a short prompt when ready:
🐙 Octo: Ready to help!
Type your question or command:
From there you can type natural prompts such as asking Octo to generate code or perform a task.
Autofix flow (what to expect)
The README describes how Octo can reduce interruptions with autofix models:
-
Octo sends a tool call or a patch to be applied (for example, a code edit). -
If the tool call fails or a patch does not apply, Octo can run an autofix model (for example, fix-json
ordiff-apply
) to repair the output. -
Octo retries the operation with the fixed output.
The project promotes these autofix models as optional helpers to increase the success rate of automated tool calls. The two autofix models called out in the README are published in the project documentation.
Development and future work (TODO list)
The project’s TODO.md
includes a prioritized list of enhancements the authors plan to add. These items are presented here verbatim from the TODO file so you can see the current roadmap.
TODO list (excerpt):
-
Situational awareness: if it’s a git repo, check the gitignore, and get a bunch of the directory hierarchy into context space automatically. -
Gemini API support: their “openai-compatible” API isn’t complete enough to work with Octo. -
Refactor History/IR for type safety: link back between i.e. tool calls, tool outputs, and original assistant messages. -
Refactor menu system to use a stack of screens that can consistently be popped, rather than ad-hoc state linking. The stack entries are typed and each different state/internal-URI can have typed data associated with it. -
Link out directly to inference websites for API keys. -
Allow Anthropic models to configure the thinking budget by tokens, rather than low/medium/high corresponding to specific budgets (2048/4096/8192).
These TODO items show areas of active improvement: better git integration, broader API compatibility, improved internal typing and UI state management, and more flexible thinking/token configuration for Anthropic models.
Plain-language FAQ — anticipated user questions
Q: What do I need to use Octo?
A: Node.js and npm
to install the octofriend
package. Install globally with npm install --global octofriend
and run octofriend
.
Q: Which models can Octo use?
A: Octo is designed to work with any OpenAI-compatible or Anthropic-compatible LLM API. The README highlights GPT-5, Claude 4, GLM-4.5, and Kimi K2 as models it works well with, but it supports any compatible model you configure.
Q: What are autofix models and why use them?
A: Autofix models (like diff-apply
and fix-json
mentioned in the README) attempt to repair failed tool outputs or broken patches so Octo can retry the operation automatically.
Q: Where do I put global settings?
A: The project uses ~/.config/octofriend/octofriend.json5
for persistent configuration like MCP server settings.
Q: Does Octo send my code or usage data anywhere?
A: The README explicitly says Octo has zero telemetry. That means the project does not collect usage data by default.
Practical configuration examples (copy and paste)
Minimal OCTO.md
example (project root)
Place this in the project root to set project-specific behavior:
# Project rules for Octo
- prefer: Claude
- disable-autofix: false
- allow-local-llm: true
~/.config/octofriend/octofriend.json5
example (MCP servers)
{
// Octofriend MCP servers
mcpServers: {
linear: {
command: "npx",
arguments: [ "-y", "mcp-remote", "https://mcp.linear.app/sse" ],
},
localLLM: {
command: "curl",
arguments: [ "http://127.0.0.1:8000/infer" ],
},
},
}
The mcpServers
structure maps a named server to a command and argument list Octo will run to fetch streaming context.
HowTo (short procedural guide)
How to install and run Octo
-
Open a terminal.
-
Install Octo globally:
npm install --global octofriend
-
Verify installation:
octofriend --version
-
Start the interactive assistant:
octofriend
-
Optionally, edit
~/.config/octofriend/octofriend.json5
to configure MCP servers.
Machine-readable HowTo and FAQ (JSON-LD)
Below are two small JSON-LD snippets you can include on a documentation page to help automated systems parse the concrete HowTo and FAQ structures. These blocks are derived directly from the project examples and the practical steps shown above.
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "Install and start Octo",
"step": [
{"@type": "HowToStep", "name": "Install", "text": "Run npm install --global octofriend"},
{"@type": "HowToStep", "name": "Verify", "text": "Run octofriend --version to confirm installation"},
{"@type": "HowToStep", "name": "Start", "text": "Run octofriend to open the interactive assistant"}
]
}
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Which models does Octo support?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Octo works with any OpenAI-compatible or Anthropic-compatible LLM API; the README specifically lists GPT-5, Claude 4, GLM-4.5, and Kimi K2 as examples."
}
},
{
"@type": "Question",
"name": "How do I add an MCP server?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Edit ~/.config/octofriend/octofriend.json5 and add an entry under mcpServers, then restart Octo."
}
}
]
}
Development tips and small code style note
From the OCTO.md
file in the source:
-
When working in the codebase, prefer type
aliases overinterface
when a class does not need to implement an interface:
// prefer this
type Blah = { ... };
// rather than this when not necessary:
interface Blah { ... }
-
The TypeScript build step npx tsc
may take up to about 20 seconds on typical machines; plan for that when building locally.
Practical checklist before enabling autofix in a production repository
These are conservative steps suggested by the project examples and the structure of the repo:
-
Test autofix locally in a sandbox repo — try autofix flows on a throwaway repo to see how patches are applied. -
Preserve human review — configure Octo so autofix produces suggested patches for review before they are merged. -
Version control for rule files — keep OCTO.md
and other rule files in version control so teams can audit changes. -
Limit MCP sources when needed — use config to limit which MCP sources can push sensitive data to Octo sessions.
These steps follow the project’s general approach to cautious adoption and source control of configuration.