How to Use Deep Research with the Gemini API: A Complete Developer Guide
When you are faced with a long-horizon research task—something that requires searching across dozens of sources, synthesizing conflicting data, and compiling a detailed, cited report—the standard back-and-forth interaction with a chat interface often falls short. You need an agent that can operate autonomously, plan its approach, execute searches in the background, and return with a finished product.
The Gemini Deep Research Agent is designed specifically for this workflow. Instead of waiting for a continuous stream of text, you hand off a complex research objective, and the system handles the heavy lifting asynchronously.
This guide breaks down exactly how to implement this agent using the Gemini API. We will walk through the initial setup, the critical distinction of using the Interactions API, and advanced features like collaborative planning, native chart generation, and real-time streaming with automatic reconnection.
What is the Gemini Deep Research Agent?
The Deep Research Agent autonomously plans, searches, and synthesizes extensive research tasks into detailed reports complete with citations. Because these tasks can take several minutes to complete, the agent is built to handle long-running operations by executing them in the background.
The most important technical detail to understand upfront is how you access it. The Deep Research Agent is exclusively available through the Interactions API. If you attempt to call it using the standard generate_content method, it will not work. The Interactions API is structured to manage stateful, long-running background tasks, which is fundamentally different from standard synchronous generation.
Currently, there are two distinct versions of the agent available, each tailored to a different operational context:
| Agent Version | Primary Design Goal | Ideal Use Case |
|---|---|---|
deep-research-preview-04-2026 |
Optimized for speed and efficiency | Best suited for streaming results back to a client user interface where quick turnaround is prioritized. |
deep-research-max-preview-04-2026 |
Maximum comprehensiveness | Designed for automated context gathering and deep synthesis where time is less of a constraint than thoroughness. |
What Capabilities Are Included?
Before diving into the code, it helps to understand the full scope of what this agent can do. The current release introduces several powerful tools for controlling the research process:
-
Collaborative Planning: You can review and refine the research plan before the agent actually begins executing the searches and synthesis. -
Native Charts and Infographics: The agent can generate its own charts, graphs, and infographics to visualize data within the report. -
Remote MCP Servers: You can connect external tools to the agent using the Model Context Protocol, vastly expanding its capabilities. -
Extended Tooling: By default, the agent has access to Google Search, URL Context, Code Execution, MCP, and File Search. -
Multimodal Research Grounding: You can provide images, PDFs, and audio files as the foundational context for the research task.
How to Set Up Your Environment
Getting started requires a minimal amount of preparation. You only need to install the Python SDK and configure your authentication.
Installing the Python SDK
Open your terminal and install the official Google GenAI package using pip:
pip install google-genai
Configuring Your API Key
The API requires authentication via an API key. The standard and most secure practice for local development is to set this key as an environment variable rather than hardcoding it into your scripts.
export GEMINI_API_KEY="your-api-key"
Replace your-api-key with the actual key you have generated. By setting this environment variable, the client library will automatically pick it up when you initialize a session.
How to Run Your First Deep Research Task
Because Deep Research tasks are asynchronous and can take several minutes, you do not wait for a direct response. Instead, you start a background task and then poll for the result.
Here is the foundational code to initiate a research task asking for the history of Google TPUs:
import time
from google import genai
client = genai.Client()
interaction = client.interactions.create(
input="Research the history of Google TPUs.",
agent="deep-research-preview-04-2026",
background=True,
)
while True:
interaction = client.interactions.get(interaction.id)
if interaction.status == "completed":
print(interaction.outputs[-1].text)
break
elif interaction.status == "failed":
print(f"Research failed: {interaction.error}")
break
time.sleep(10)
Breaking Down the Logic
-
Client Initialization: client = genai.Client()sets up your connection to the API, relying on the environment variable you configured earlier. -
Creating the Interaction: client.interactions.create()initiates the task. Thebackground=Trueparameter is what tells the system to process this asynchronously. You also specify which agent version to use via theagentparameter. -
The Polling Loop: The while True:loop is necessary because the task is running on a remote server. Your script needs to periodically check the status. -
Status Checking: client.interactions.get(interaction.id)fetches the current state. If the status iscompleted, it prints the final text output. If it isfailed, it prints the error and breaks the loop to prevent infinite hanging. -
Throttling: time.sleep(10)pauses the loop for 10 seconds between checks. Polling a remote API continuously without a delay is an inefficient practice that can lead to rate limiting.
How to Use Collaborative Planning
For critical research tasks, you might not want the agent to immediately start searching. You might want to review its proposed approach first. Collaborative planning allows you to request a plan, refine it iteratively, and then issue the final command to execute.
This process involves three distinct steps.
Step 1: Request a Plan
To get a plan instead of a final report, you must pass a specific configuration in the agent_config dictionary, setting collaborative_planning to True.
import time
from google import genai
client = genai.Client()
plan = client.interactions.create(
agent="deep-research-preview-04-2026",
input="Research Google TPUs vs competitor hardware.",
agent_config={"type": "deep-research", "collaborative_planning": True},
background=True,
)
while (result := client.interactions.get(id=plan.id)).status != "completed":
time.sleep(5)
print(result.outputs[-1].text)
When this completes, the output will be a structured plan outlining how the agent intends to tackle the research.
Step 2: Refine the Plan
If the plan is missing a crucial element—for example, a comparison of power efficiency—you can ask for a revision. To maintain the context of the conversation, you must pass the previous_interaction_id. You must also keep collaborative_planning set to True so the system knows you are still in the planning phase.
refined = client.interactions.create(
agent="deep-research-preview-04-2026",
input="Add a section comparing power efficiency.",
agent_config={"type": "deep-research", "collaborative_planning": True},
previous_interaction_id=plan.id,
background=True,
)
while (result := client.interactions.get(id=refined.id)).status != "completed":
time.sleep(5)
print(result.outputs[-1].text)
You can repeat this refinement step as many times as necessary.
Step 3: Approve and Execute
This is the step where most implementation errors occur. When you are satisfied with the plan, you cannot simply send a text message like “go ahead.” If you keep collaborative_planning set to True, the agent will simply generate another revised plan.
To trigger the actual report generation, you must explicitly set collaborative_planning=False on the final turn.
report = client.interactions.create(
agent="deep-research-preview-04-2026",
input="Plan looks good!",
agent_config={"type": "deep-research", "collaborative_planning": False},
previous_interaction_id=refined.id,
background=True,
)
while (result := client.interactions.get(id=report.id)).status != "completed":
time.sleep(5)
print(result.outputs[-1].text)
Flipping this boolean flag is the explicit signal to the API that the planning phase is over and the background synthesis should begin.
How to Generate Native Charts and Infographics
A text-heavy report can be difficult to parse. The Deep Research Agent can generate its own data visualizations and return them as part of the output.
To enable this, you set visualization="auto" in the agent_config. However, enabling the capability is only half the equation; you must also explicitly ask for visuals in your text prompt for the best results.
import base64
from google import genai
client = genai.Client()
interaction = client.interactions.create(
agent="deep-research-preview-04-2026",
input="Analyze global semiconductor market trends. Include charts showing market share changes.",
agent_config={"type": "deep-research", "visualization": "auto"},
background=True,
)
while (result := client.interactions.get(id=interaction.id)).status != "completed":
time.sleep(5)
for output in result.outputs:
if output.type == "text":
print(output.text)
elif output.type == "image" and output.data:
image_bytes = base64.b64decode(output.data)
# display(Image(data=image_bytes)) # Jupyter
Handling Image Outputs
Notice how the output loop is structured. The agent’s response is no longer just a single text block. It is a collection of outputs that can be of different types.
When the output type is image, the data is not returned as a URL. It is returned as a base64-encoded string. To view or save the image, you must use base64.b64decode() to convert the string back into raw image bytes. In a Jupyter Notebook environment, you could then pass those bytes directly to the Image display function.
How to Connect Remote MCP Servers
To push the boundaries of what the agent can research, you can connect it to external tools using the Model Context Protocol (MCP). This allows the agent to query private databases, internal APIs, or specialized third-party services.
You connect to a remote MCP server by adding it to the tools array, providing the server name, URL, and any required authentication headers.
interaction = client.interactions.create(
agent="deep-research-preview-04-2026",
input="Research how recent geopolitical events influenced USD interest rates",
tools=[
{
"type": "mcp_server",
"name": "Finance Data Provider",
"url": "https://finance.example.com/mcp",
"headers": {"Authorization": "Bearer my-token"},
}
],
background=True,
)
Authentication Options
The API supports three distinct ways to authenticate with an MCP server:
-
No-auth: Used for open, public endpoints that require no headers. -
Bearer Token: The most common method, shown in the example above, where a static token is passed in the Authorization header. -
OAuth: For more secure, token-refreshing workflows. You would need to use an external authentication library, such as google-auth, to fetch the OAuth token first, and then pass that dynamically fetched token into the headers dictionary.
If your MCP server exposes many tools but you only want the agent to use specific ones, you can utilize theallowed_toolsparameter to restrict its access.
How to Configure the Agent’s Toolset
By default, the Deep Research Agent comes equipped with a specific set of tools to do its job: Google Search, URL Context, and Code Execution. However, there are scenarios where you want to restrict or change its capabilities.
Perhaps you want to prevent it from executing code, or maybe you only want it to search a private corpus of documents you have uploaded. You can customize the available tools by passing a list to the tools parameter.
The full list of configurable tool types includes:
-
google_search(Default): Searches the public web. -
url_context(Default): Reads and summarizes the content of specific web pages. -
code_execution(Default): Runs code for calculations and data analysis. -
mcp_server(Optional): Connects to remote MCP servers. -
file_search(Optional): Searches through uploaded document corpora.
Example: Restricting to Web Search Only
If you want the agent to search the web but explicitly prevent it from running any code execution, you can pass only the google_search tool:
# Only web search allowed
interaction = client.interactions.create(
agent="deep-research-preview-04-2026",
input="Latest developments in quantum computing.",
tools=[{"type": "google_search"}],
background=True,
)
Important Note on Defaults: If you completely omit the tools parameter from your API call, the system does not assume the agent has no tools. Instead, it defaults to enabling Google Search, URL Context, and Code Execution.
How to Use Multimodal Research Grounding
Sometimes your research is based on a specific document or image you already possess. Rather than just describing the document in your prompt, you can pass the actual file to the agent to ground its research.
The agent accepts images, PDFs, and audio files. To do this, the input parameter changes from a simple string to a list of dictionaries, where each dictionary defines the type of content and the data itself.
interaction = client.interactions.create(
agent="deep-research-preview-04-2026",
input=[
{"type": "text", "text": "What has been the impact of this research paper?"},
{"type": "document", "uri": "https://arxiv.org/pdf/1706.03762", "mime_type": "application/pdf"},
],
background=True,
)
In this example, we are passing a direct URL to a PDF file and explicitly declaring its mime_type as application/pdf. The agent will retrieve this document, analyze its contents, and use it as the foundational context for researching its subsequent impact.
How to Implement Real-Time Streaming with Reconnection
Polling every few seconds is effective for backend scripts, but it results in a poor user experience for frontend applications. Users do not want to stare at a loading screen for five minutes.
To solve this, the Interactions API supports real-time streaming. By setting stream=True, the server pushes updates to your client as they happen.
Furthermore, you can enable thinking_summaries="auto". This is a powerful feature that causes the agent to stream its intermediate reasoning—its “thoughts” on what it is searching for and why—alongside the final text and generated images.
Here is a comprehensive implementation that includes a mechanism to reconnect if the network connection drops:
import base64
from google import genai
from IPython.display import Image, display
client = genai.Client()
interaction_id = None
last_event_id = None
is_complete = False
def process_stream(stream):
global interaction_id, last_event_id, is_complete
for chunk in stream:
if chunk.event_type == "interaction.start":
interaction_id = chunk.interaction.id
if chunk.event_id:
last_event_id = chunk.event_id
if chunk.event_type == "content.delta":
if chunk.delta.type == "text":
print(chunk.delta.text, end="", flush=True)
elif chunk.delta.type == "thought_summary":
print(f"\n💭 {chunk.delta.content.text}", flush=True)
elif chunk.delta.type == "image" and chunk.delta.data:
image_bytes = base64.b64decode(chunk.delta.data)
display(Image(data=image_bytes))
elif chunk.event_type in ("interaction.complete", "error"):
is_complete = True
if chunk.event_type == "interaction.complete":
print("\n✅ Research Complete")
stream = client.interactions.create(
input="Research AI chip market trends. Include charts comparing vendors.",
agent="deep-research-preview-04-2026",
background=True,
stream=True,
agent_config={
"type": "deep-research",
"thinking_summaries": "auto",
"visualization": "auto",
},
)
process_stream(stream)
# Reconnect if the connection drops
while not is_complete and interaction_id:
status = client.interactions.get(interaction_id)
if status.status != "in_progress":
break
stream = client.interactions.get(
id=interaction_id,
stream=True,
last_event_id=last_event_id,
)
process_stream(stream)
Understanding the Streaming Logic
-
Tracking State: The variables interaction_idandlast_event_idare crucial. They are updated every time a chunk is received. -
Processing Deltas: The script checks chunk.event_type. If it iscontent.delta, it looks at thedelta.typeto decide whether to print standard text, print a thought summary, or decode and display a base64 image. -
Handling Completion: The loop terminates its processing when it receives an interaction.completeorerrorevent. -
The Reconnection Mechanism: Network connections drop. If the stream fails, the while not is_completeloop activates. It checks the server to ensure the background task is still running. If it is, it callsclient.interactions.get()but passes thelast_event_id. This tells the server exactly where the client left off, preventing the user from missing any data or seeing duplicate information.
Frequently Asked Questions
Why am I getting an error when trying to use Deep Research?
The most common reason is using the wrong API endpoint. The Deep Research Agent cannot be accessed via the standard generate_content interface. You must use client.interactions.create(). If you use the wrong method, the API will not recognize the agent configuration.
I told the agent to start the research, but it just keeps generating plans. What went wrong?
This happens when you do not explicitly change the collaborative_planning flag. The agent relies entirely on the collaborative_planning=False parameter to know that the planning phase is over. If you send a message like “go ahead” or “looks good” but leave collaborative_planning=True in the agent_config, the system assumes you are requesting another revision of the plan.
I enabled the visualization feature, but my output only contains text. Why?
Setting visualization="auto" grants the agent the permission and capability to generate images, but it does not force it to do so. The agent relies heavily on your prompt. If you want charts, you must explicitly request them in your input text (e.g., “Include charts showing market share changes”). Without an explicit request in the prompt, the agent will typically default to generating a purely text-based synthesis.
Can I stop the agent from searching the public internet?
Yes. By default, the agent uses Google Search, URL Context, and Code Execution. If you want to restrict it to only searching private documents (like those enabled by file_search or a custom mcp_server), you must explicitly define the tools parameter in your API call and omit google_search and url_context.
What happens to my research task if my internet disconnects during real-time streaming?
Because you initialized the task with background=True, the research continues to run on the server regardless of your client’s connection status. Your frontend simply stops receiving updates. Using the last_event_id tracking method shown in the streaming code above, your application can re-establish a connection and seamlessly resume receiving the stream right from the moment the connection was lost.
How do I pass a PDF document to the agent for analysis?
You do not paste the text of the PDF into the prompt. Instead, you format the input parameter as a list. You include a dictionary for your text prompt (with type: "text") and a second dictionary for the document (with type: "document"). You then provide the URL to the PDF in the uri field and specify the mime_type as application/pdf. The agent will fetch and parse the document natively.

