探索Google Interactions API：简化模型与代理交互的强大工具

摘要

Google的Interactions API是一个统一的接口，用于与Gemini模型（如Gemini 3 Pro）和代理（如Gemini Deep Research）进行交互。它支持复杂的上下文管理、工具调用和状态处理，通过单一RESTful端点/interactions实现。目前在公共beta阶段，可通过Gemini API在Google AI Studio中使用，提供服务器端状态管理、解释性数据模型、后台执行和远程MCP工具支持，帮助开发者构建高效的代理应用。

引言：为什么Interactions API值得你关注？

想象一下，你正在开发一个AI应用，需要处理复杂的对话、调用外部工具，甚至运行长时间的研究任务。过去，你可能需要在不同的API之间切换，管理繁琐的上下文和状态。现在，有了Interactions API，一切变得简单多了。作为一个统一的接口，它让Gemini模型和代理无缝协作，帮助你从简单聊天到高级代理任务一气呵成。

如果你是刚毕业的软件工程师或AI开发者，你可能会问：“这个API到底能解决什么问题？”简单来说，它是为现代代理应用设计的，处理像“思考”、工具使用这样的高级功能，而不让你的代码变得乱七八糟。基于我多年的AI开发经验，这个API的出现就像给开发者递上了一把万能钥匙——它简化了从原型到生产的流程。今天，我们就来一步步拆解它，看看如何用它构建实际应用。

Interactions API的核心概述

Interactions API的核心在于它的统一性。它提供了一个单一的RESTful端点/interactions，让你可以用model参数与模型交互，或用agent参数与代理交互。目前，它支持的代理包括deep-research-pro-preview-12-2025。

这个API扩展了generateContent的核心能力，添加了代理应用所需的功能：

可选的服务器端状态：你可以把历史管理交给服务器，简化客户端代码，减少错误，还可能通过缓存命中降低成本。
可解释和可组合的数据模型：一个干净的schema，让你调试、操作、流式处理和推理交错的消息、思考、工具及其结果。
后台执行：把长时间推理循环交给服务器，不用保持客户端连接。
远程MCP工具支持：模型可以直接调用Model Context Protocol (MCP)服务器作为工具。

为什么需要一个新API？因为模型正在演变为系统，甚至代理本身。原来的generateContent适合无状态的请求-响应文本生成，比如聊天机器人。但现在，景观变了——新能力如“思考”和高级工具使用需要一个专属接口。强行把这些塞进旧API会让它变得复杂而脆弱。

如何快速上手Interactions API？

准备好开始了吗？首先，你需要一个Gemini API密钥，从Google AI Studio获取。然后，按照API文档启动。OpenAPI规范在这里：https://ai.google.dev/api/interactions.openapi.json。

用SDK简化一切。在Python中，用google-genai包（从1.55.0版本起）；在JavaScript中，用@google/genai包（从1.33.0版本起）。这些SDK让调用变得直观。

例如，一个简单的文本提示调用：

from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-pro-preview",
    input="Tell me a short joke about programming."
)

print(interaction.outputs[-1].text)

在JavaScript中类似：

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
  model: "gemini-3-pro-preview",
  input: "Tell me a short joke about programming.",
});

console.log(interaction.outputs[interaction.outputs.length - 1].text);

用cURL：

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-pro-preview",
    "input": "Tell me a short joke about programming."
}'

这些例子展示了API的简单性——输入一个字符串，得到输出。但别止步于此；它支持更复杂的输入，如内容对象列表或角色轮换。

构建多轮对话：有状态还是无状态？

多轮对话是AI应用的灵魂。Interactions API提供两种方式：有状态（服务器管理历史）和无状态（客户端管理）。

有状态对话：让服务器帮你记事

通过传递前一个交互的ID到previous_interaction_id，继续对话。这节省了带宽，提高效率。

例如，第一轮：

interaction1 = client.interactions.create(
    model="gemini-2.5-flash",
    input="Hi, my name is Phil."
)
print(f"Model: {interaction1.outputs[-1].text}")

第二轮：

interaction2 = client.interactions.create(
    model="gemini-2.5-flash",
    input="What is my name?",
    previous_interaction_id=interaction1.id
)
print(f"Model: {interaction2.outputs[-1].text}")

类似地在JavaScript和cURL中操作。你可以用client.interactions.get(id)检索过去交互，检查历史。

为什么这样好？因为它利用服务器缓存，提高命中率，降低成本。在我的经验中，这对长对话特别有用，避免客户端反复发送整个历史。

无状态对话：全由你掌控

如果你喜欢手动管理，把整个历史作为输入列表发送。

conversation_history = [
    {"role": "user", "content": "What are the three largest cities in Spain?"}
]

interaction1 = client.interactions.create(
    model="gemini-2.5-flash",
    input=conversation_history
)

conversation_history.append({"role": "model", "content": interaction1.outputs[-1].text})
conversation_history.append({
    "role": "user",
    "content": "What is the most famous landmark in the second one?"
})

interaction2 = client.interactions.create(
    model="gemini-2.5-flash",
    input=conversation_history
)

这给你完全控制，但增加了客户端复杂性。选择哪种？取决于你的应用——移动端可能偏好有状态以节省数据。

解锁多模态能力：不止文本

Interactions API的多模态支持让它脱颖而出。你可以处理图像、音频、视频和文档，既理解输入，也生成输出。

多模态理解：从数据中提取洞见

用base64编码内联数据，或用Files API处理大文件。

图像理解

问：“这个图像描述了什么？”

import base64
from pathlib import Path

with open(Path("car.png"), "rb") as f:
    base64_image = base64.b64encode(f.read()).decode('utf-8')

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input=[
        {"type": "text", "text": "Describe the image."},
        {"type": "image", "data": base64_image, "mime_type": "image/png"}
    ]
)

输出会详细描述图像内容。在实际应用中，这对图像分类或描述生成很有用。

音频理解

类似地处理音频：

with open(Path("speech.wav"), "rb") as f:
    base64_audio = base64.b64encode(f.read()).decode('utf-8')

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input=[
        {"type": "text", "text": "What does this audio say?"},
        {"type": "audio", "data": base64_audio, "mime_type": "audio/wav"}
    ]
)

这适合转录或分析语音。

视频理解

提供时间戳摘要：

with open(Path("video.mp4"), "rb") as f:
    base64_video = base64.b64encode(f.read()).decode('utf-8')

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input=[
        {"type": "text", "text": "What is happening in this video? Provide a timestamped summary."},
        {"type": "video", "data": base64_video, "mime_type": "video/mp4"}
    ]
)

对视频内容分析超级实用。

文档理解

处理PDF：

with open("sample.pdf", "rb") as f:
    base64_pdf = base64.b64encode(f.read()).decode('utf-8')

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input=[
        {"type": "text", "text": "What is this document about?"},
        {"type": "document", "data": base64_pdf, "mime_type": "application/pdf"}
    ]
)

这让文档总结变得轻松。

多模态生成：创造视觉内容

生成图像：

interaction = client.interactions.create(
    model="gemini-3-pro-image-preview",
    input="Generate an image of a futuristic city.",
    response_modalities=["IMAGE"]
)

for output in interaction.outputs:
    if output.type == "image":
        with open("generated_city.png", "wb") as f:
            f.write(base64.b64decode(output.data))

这用response_modalities指定输出类型。

代理能力：从工具到完整代理

Interactions API是为代理设计的，支持函数调用、内置工具、MCP和结构化输出。

使用代理：如Gemini Deep Research

对于长时研究任务，用deep-research-pro-preview-12-2025：

import time

initial_interaction = client.interactions.create(
    input="Research the history of the Google TPUs with a focus on 2025 and 2026.",
    agent="deep-research-pro-preview-12-2025",
    background=True
)

while True:
    interaction = client.interactions.get(initial_interaction.id)
    if interaction.status == "completed":
        print(interaction.outputs[-1].text)
        break
    time.sleep(10)

这在后台运行，轮询结果。代理会合成综合报告。

工具和函数调用

定义自定义函数：

def get_weather(location: str):
    return f"The weather in {location} is sunny."

weather_tool = {
    "type": "function",
    "name": "get_weather",
    "description": "Gets the weather for a given location.",
    "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
    }
}

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input="What is the weather in Paris?",
    tools=[weather_tool]
)

for output in interaction.outputs:
    if output.type == "function_call":
        result = get_weather(**output.arguments)
        interaction = client.interactions.create(
            model="gemini-2.5-flash",
            previous_interaction_id=interaction.id,
            input=[{
                "type": "function_result",
                "name": output.name,
                "call_id": output.id,
                "result": result
            }]
        )

处理工具调用并返回结果。

内置工具如google_search、url_context、code_execution。

例如，用url_context总结网页：

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input="Summarize the content of https://www.wikipedia.org/",
    tools=[{"type": "url_context"}]
)

远程MCP：简化外部工具集成

直接调用MCP服务器：

mcp_server = {
    "type": "mcp_server",
    "name": "weather_service",
    "url": "https://gemini-api-demos.uc.r.appspot.com/mcp"
}

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input="What is the weather like in New York today?",
    tools=[mcp_server]
)

这让模型直接访问远程工具。

结构化输出：用JSON Schema强制格式

用Pydantic或Zod定义schema：

from pydantic import BaseModel, Field
from typing import Literal, Union

class SpamDetails(BaseModel):
    reason: str = Field(description="The reason why the content is considered spam.")
    spam_type: Literal["phishing", "scam", "unsolicited promotion", "other"]

class NotSpamDetails(BaseModel):
    summary: str = Field(description="A brief summary of the content.")
    is_safe: bool = Field(description="Whether the content is safe for all audiences.")

class ModerationResult(BaseModel):
    decision: Union[SpamDetails, NotSpamDetails]

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input="Moderate the following content: 'Congratulations! You've won a free cruise...'",
    response_format=ModerationResult.model_json_schema(),
)

解析JSON输出，确保结构化。

结合工具和结构化输出

例如，结合google_search和schema获取结构化体育结果。

高级特性：提升你的应用

流式响应：实时获取

用stream=True增量接收：

stream = client.interactions.create(
    model="gemini-2.5-flash",
    input="Explain quantum entanglement in simple terms.",
    stream=True
)

for chunk in stream:
    if chunk.event_type == "content.delta":
        if chunk.delta.type == "text":
            print(chunk.delta.text, end="", flush=True)

这对用户体验很好。

配置生成

自定义温度等：

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input="Tell me a story about a brave knight.",
    generation_config={
        "temperature": 0.7,
        "max_output_tokens": 500,
        "thinking_level": "low",
    }
)

处理文件

用远程URI：

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input=[
        {"type": "image", "uri": "https://github.com/.../cats-and-dogs.jpg"},
        {"type": "text", "text": "Describe what you see."}
    ]
)

或用Files API上传大文件，然后用URI。

数据模型概述

Interaction资源包括：

属性	类型	描述
id	string	唯一ID
model / agent	string	模型或代理
input	Content[]	输入
outputs	Content[]	输出
tools	Tool[]	工具
previous_interaction_id	string	前交互ID
stream	boolean	是否流式
status	string	状态（如completed）
background	boolean	是否后台
store	boolean	是否存储（默认true）
usage	Usage	令牌使用

支持的模型和代理

模型名称	类型	ID
Gemini 2.5 Pro	Model	gemini-2.5-pro
Gemini 2.5 Flash	Model	gemini-2.5-flash
Gemini 2.5 Flash-lite	Model	gemini-2.5-flash-lite
Gemini 3 Pro Preview	Model	gemini-3-pro-preview
Deep Research Preview	Agent	deep-research-pro-preview-12-2025

API工作原理：数据存储与保留

Interaction是核心资源，代表完整一轮。默认存储（store=true），付费层保留55天，免费层1天。设置store=false opting out，但不兼容后台或previous_id。

用delete方法删除：client.interactions.delete(id)。

最佳实践：从经验中学习

利用previous_interaction_id提高缓存命中，降低成本。
混合模型和代理：先用Deep Research收集数据，再用模型总结。

在我的项目中，这些实践让应用更高效。

SDK支持

用最新SDK：Python 1.55.0+，JS 1.33.0+。

限制：beta阶段需知

Beta中，可能有变化。
不支持Grounding with Google Maps、Computer Use。
输出顺序有时错（工具前文本）。
不支持MCP、函数调用、内置工具组合；Gemini 3不支持远程MCP。

反馈与下一步

分享反馈：https://discuss.ai.google.dev/。下一步，探索Deep Research。

FAQ：常见问题解答

Interactions API和generateContent有什么区别？

Interactions API为代理设计，支持复杂历史；generateContent适合简单生成。

如何处理后台任务？

用background=true启动，然后get(id)轮询。

存储数据安全吗？

默认存储，遵守terms；可opt out。

支持哪些多模态？

图像、音频、视频、文档理解；图像生成。

如何调试工具调用？

检查outputs中的function_call，处理result。

通过这些，我们看到了Interactions API的潜力。它不只是工具，而是构建未来AI的基石。希望这篇文章帮你上手——如果有疑问，随时实验！（字数：约4500字）

揭秘Google Interactions API：如何让你的AI代理开发效率翻倍？