How to Build with Nano Banana: The Complete Developer Guide
Google recently released Gemini 2.5 Flash Image, a powerful new model for image generation and editing, also known by its codename, Nano Banana. This model introduces state-of-the-art capabilities for creating and manipulating images, unlocking a wide range of new applications for developers.
This comprehensive guide provides everything you need to integrate Gemini 2.5 Flash Image (Nano Banana) into your applications using the Gemini Developer API. Whether you’re looking to add creative image generation to your product or need to automate image editing workflows, this tutorial will walk you through every step of the implementation process.
The Core Question This Article Answers
How can developers leverage the Gemini 2.5 Flash Image (Nano Banana) multimodal model through API integration to implement advanced image generation, editing, and restoration capabilities in their applications?
Getting Started with Nano Banana in Google AI Studio
What’s the best way for developers to experiment with Nano Banana without writing any code? Google AI Studio provides the ideal environment for prototyping and testing prompts before implementation.
While end-users can access Nano Banana through the Gemini app, developers will find Google AI Studio to be the perfect playground for experimenting with all available AI models. This web-based environment allows you to test prompts and explore capabilities without writing any code, serving as the entry point for building with the Gemini API.
You can use Nano Banana free of charge within AI Studio. To get started, visit aistudio.google.com, sign in with your Google account, and select Nano Banana from the model picker. For direct access, use the direct link ai.studio/banana to begin a new session with the model immediately.
Beyond basic prototyping, you can also build low-code Nano Banana web applications directly in AI Studio at ai.studio/apps, or explore and remix existing applications like the PixShop app.
Author’s Reflection: The availability of a risk-free sandbox environment like AI Studio significantly lowers the barrier to entry for developers. Spending time here to conceptualize and test ideas before writing code is crucial for avoiding rework later in the development process. This approach mirrors best practices in software development where prototyping precedes implementation.
Project Setup and Configuration
What are the essential prerequisites for starting to use Nano Banana through its API? You need an API key, billing setup, and the appropriate SDK for your programming language.
To follow this guide and integrate Nano Banana into your applications, you’ll need to complete three essential setup steps: obtaining an API key, configuring billing for your project, and installing the appropriate software development kit.
Generating Your API Key
The process for obtaining your API credentials is straightforward:
-
Navigate to Google AI Studio and click Get API key in the left navigation panel -
On the subsequent page, click Create API key -
Select an existing Google Cloud project or create a new one—this project will manage billing for your API usage
Once completed, your API key will be displayed. Make sure to copy and store it securely, as you’ll need it to authenticate all API requests.
Setting Up Billing
While prototyping in AI Studio is free, using the model via the API is a paid service that requires billing setup. In the API key management screen, click Set up billing next to your project and follow the on-screen instructions to configure payment.
Understanding Nano Banana Pricing
Image generation with Nano Banana costs 0.30 per 1M input tokens and 1. Always refer to the official Gemini 2.5 Flash Image pricing table for the most current information.
Installing the SDK
Choose the appropriate SDK for your programming language environment:
Python Installation:
pip install -U google-genai
# Install the Pillow library for image manipulation
pip install Pillow
JavaScript/TypeScript Installation:
npm install @google/genai
The following examples use the Python SDK for demonstration, but equivalent JavaScript/TypeScript code snippets are available through the provided GitHub Gist.
Author’s Reflection: Clear preparation is foundational to project success. The billing setup step is particularly important—understanding the cost structure upfront helps with budget planning and prevents unexpected expenses down the line. I’ve found that projects with clear financial boundaries from the beginning tend to have more sustainable implementation plans.
Image Generation from Text Prompts
How can developers create entirely new images from simple text descriptions using Nano Banana? By calling the API with a descriptive prompt and the correct model ID.
The most fundamental capability of Nano Banana is generating one or more images from descriptive text prompts. This functionality opens up numerous applications from blog post illustration generation to UI placeholder creation and marketing visual production.
To use this feature, you’ll need to use the model ID gemini-2.5-flash-image-preview
for all API requests. The response structure is multimodal, meaning it can contain both text and image data that you’ll need to handle appropriately in your code.
Application Scenario: Imagine you’re building a content management system that automatically generates featured images for blog posts. Instead of relying on stock photography or manual design, you could use Nano Banana to create unique, relevant imagery based on the article’s content.
from google import genai
from PIL import Image
from io import BytesIO
# Configure the client with your API key
client = genai.Client(api_key="YOUR_API_KEY")
prompt = """Create a photorealistic image of an orange cat
with green eyes, sitting on a couch."""
# Call the API to generate content
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=prompt,
)
# The response can contain both text and image data.
# Iterate through the parts to find and save the image.
for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO(part.inline_data.data))
image.save("cat.png")
Output:
The multimodal response structure means you receive a list of parts
that can contain interleaved text and image data (inline_data
). The code above demonstrates how to iterate through these parts to extract and save the generated image while also handling any textual response that might be included.
Image Editing with Text and Image Inputs
How can developers modify existing images based on textual instructions? By providing both an image and text prompt to the API.
Nano Banana excels at maintaining character and content consistency when editing existing images, making it ideal for tasks that require modifying specific elements while preserving overall composition and style.
Application Scenario: Consider an e-commerce platform that needs to showcase products in different environments. Instead of manually photoshopping each product into various scenes, developers could build an automated system that places products into contextually appropriate backgrounds based on simple text instructions.
from google import genai
from PIL import Image
from io import BytesIO
client = genai.Client(api_key="YOUR_API_KEY")
prompt = """Using the image of the cat, create a photorealistic,
street-level view of the cat walking along a sidewalk in a
New York City neighborhood, with the blurred legs of pedestrians
and yellow cabs passing by in the background."""
image = Image.open("cat.png")
# Pass both the text prompt and the image in the 'contents' list
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[prompt, image],
)
for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO(part.inline_data.data))
image.save("cat2.png")
Input and Output:
Author’s Reflection: The model’s ability to maintain subject consistency while completely transforming the environment is particularly impressive. This isn’t simple image compositing—it’s a semantic understanding of the prompt that recontextualizes the subject appropriately. This capability opens up possibilities for coherent visual storytelling that maintains character continuity across different scenes.
Photo Restoration with Nano Banana
Can Nano Banana help restore and enhance old or damaged photographs? Yes, with simple prompts, it can effectively restore and colorize historical images.
One of the most powerful applications of Nano Banana is photo restoration. The model can breathe new life into old photographs by repairing damage, enhancing details, and adding realistic colorization based on its training data.
Application Scenario: Historical archives, museums, and genealogy services could use this technology to preserve and enhance historical images. Family historians could restore damaged family photos, and media companies could colorize historical footage for documentaries.
from google import genai
from PIL import Image
from io import BytesIO
client = genai.Client(api_key="YOUR_API_KEY")
prompt = "Restore and colorize this image from 1932"
image = Image.open("lunch.jpg") # "Lunch atop a Skyscraper, 1932"
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[prompt, image],
)
for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO(part.inline_data.data))
image.save("lunch-restored.png")
Original and Output:
The restoration process demonstrates the model’s understanding of historical context, materials, and plausible coloring based on the time period. Unlike simple filter-based colorization, Nano Banana uses semantic understanding to make educated decisions about appropriate colors for different elements in the image.
Working with Multiple Input Images
Can Nano Banana process multiple images together to accomplish more complex tasks? Yes, providing multiple images enables sophisticated editing workflows.
For more complex editing tasks, you can provide multiple images as input to the model. This capability enables advanced applications like virtual try-on, style transfer, and multi-element compositing.
Application Scenario: Fashion retailers could implement virtual try-on features where customers upload a photo of themselves and see how clothing items would look without physically trying them on. Design tools could allow users to combine elements from multiple reference images into a cohesive composition.
from google import genai
from PIL import Image
from io import BytesIO
client = genai.Client(api_key="YOUR_API_KEY")
prompt = "Make the girl wear this t-shirt. Leave the background unchanged."
image1 = Image.open("girl.png")
image2 = Image.open("tshirt.png")
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[prompt, image1, image2],
)
for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO(part.inline_data.data))
image.save("girl-with-tshirt.png")
Inputs 1 & 2 and Output:
The ability to process multiple images demonstrates Nano Banana’s sophisticated understanding of spatial relationships, materials, and how different elements should interact when combined. The model doesn’t just overlay images—it understands how clothing should naturally drape and fit on a human body.
Conversational Image Editing
Can developers engage in multi-turn conversations to iteratively refine images? Yes, using the chats session functionality maintains context across requests.
For iterative refinement of images, Nano Banana supports conversational editing through chat sessions that maintain context across multiple requests. This allows for a more natural, interactive editing process similar to working with a human designer.
Application Scenario: Design collaboration tools could implement a conversational interface where stakeholders can gradually refine visual concepts through natural language instructions. Content creators could iteratively adjust images until they match their vision without needing to start from scratch each time.
from google import genai
from PIL import Image
from io import BytesIO
client = genai.Client(api_key="YOUR_API_KEY")
# Create a chat session
chat = client.chats.create(
model="gemini-2.5-flash-image-preview"
)
# Make the first image edit
response1 = chat.send_message(
[
"Change the cat to a bengal cat, leave everything else the same",
Image.open("cat.png"),
]
)
# display / save image...
# Continue chatting and editing
response2 = chat.send_message("The cat should wear a funny party hat")
# display / save image...
Input and Outputs 1 & 2:
It’s important to note that after many conversational edits, image features may begin to degrade or “drift” from the original. When this happens, it’s best practice to start a new session with the latest image and a more detailed, consolidated prompt to maintain high fidelity.
Author’s Reflection: The conversational interface transforms image editing from a one-off instruction execution to an interactive exploration process. This approach more closely mirrors how human creators work, but with potentially far greater efficiency. The “drift” phenomenon observed in extended sessions suggests that complex modifications might be better handled through carefully crafted one-time prompts or by refreshing sessions at key points to maintain quality.
Best Practices and Prompting Tips for Nano Banana
What techniques produce the best results when prompting Nano Banana? Specificity, context, iterative refinement, and positive framing significantly improve output quality.
Achieving optimal results with Nano Banana requires thoughtful prompt construction. Based on the model’s capabilities and limitations, several best practices have emerged for creating effective prompts that produce the desired outputs.
Be Hyper-Specific: The more detail you provide about subjects, colors, lighting, and composition, the more control you have over the output. Instead of “a dog,” try “a golden retriever puppy with fluffy fur, sitting in a sunbeam near a window, with soft morning light creating gentle shadows.”
Provide Context and Intent: Explain the purpose or desired mood of the image. The model’s understanding of context will influence its creative choices. For example, “create a serene landscape for a meditation app background” provides more guidance than simply “a peaceful landscape.”
Iterate and Refine: Don’t expect perfection on the first try. Use the model’s conversational ability to make incremental changes and refine your image through multiple generations.
Use Step-by-Step Instructions: For complex scenes, break your prompt into a series of clear, sequential instructions rather than a single complex sentence. This approach often yields more coherent results.
Use Positive Framing: Instead of negative prompts like “no cars,” describe the desired scene positively: “an empty, deserted street with no signs of traffic.” The model generally responds better to instructions about what to include rather than what to exclude.
Control the Camera: Use photographic and cinematic terms to direct the composition, such as “wide-angle shot,” “macro shot,” “low-angle perspective,” or “shallow depth of field with bokeh background.”
Author’s Reflection: Crafting effective prompts is more art than science. It requires developers to clearly conceptualize the final image and translate that vision into machine-understandable instructions. The most effective prompts often strike a balance between specificity and creative freedom—providing clear constraints while leaving room for the model’s imagination to enhance the result.
Community Examples and Inspiration
What are other developers building with Nano Banana? The community is exploring diverse applications from perspective shifting to 3D model generation.
The developer community has rapidly embraced Nano Banana, creating innovative applications across various domains. Exploring these examples can provide inspiration for your own projects and demonstrate the model’s versatile capabilities.
-
✦ Shifting camera perspective by @henrydaubrez demonstrates how to alter photographic viewpoints while maintaining scene consistency -
✦ Few-shot learning for consistent character design by @multimodalart shows techniques for maintaining character consistency across multiple generated images -
✦ “What does the red arrow see” Google Maps transforms by @tokumin explores creative transformations of map data into realistic scenes -
✦ Generating images from stick figure annotations by @yachimat_manga illustrates how simple drawings can be converted into detailed images -
✦ Creating 3D models from still images by @deedydas demonstrates potential bridges between 2D imagery and 3D assets -
✦ Generating location-based AR experiences by @bilawalsidhu explores how Nano Banana can enhance augmented reality applications -
✦ Converting 2D maps into 3D graphics by @demishassabis shows sophisticated spatial understanding and translation capabilities
These examples represent just a fraction of the innovative applications being developed with Nano Banana. The common thread across these projects is creative experimentation with the model’s capabilities to solve unique problems or create novel experiences.
Resources and Next Steps
Where should developers go after mastering the basics of Nano Banana? Official documentation, advanced prompting guides, and example applications provide pathways for deeper learning.
This guide has covered the fundamental building blocks for working with Gemini 2.5 Flash Image (Nano Banana). You’ve learned how to set up your development environment, generate and edit images, and apply advanced techniques like multi-image processing and conversational editing.
To continue your learning journey, explore these official resources:
-
✦ The Google AI Studio for continued experimentation and prototyping -
✦ The Gemini API documentation for comprehensive technical reference -
✦ The Nano Banana image generation guide for specialized guidance -
✦ The prompting best practices article for advanced techniques -
✦ The PixShop app in AI Studio for a practical implementation example
As you build with Nano Banana, remember that the most successful applications often emerge from identifying specific user needs that align with the model’s capabilities rather than seeking generic use cases. The technology provides powerful tools, but their value is realized through thoughtful implementation focused on solving real problems.
Action Checklist / Implementation Steps
-
Experiment in AI Studio: Begin with free prototyping at ai.studio/banana to understand capabilities -
Secure API Access: Obtain your API key from AI Studio and set up billing on your Google Cloud project -
Set Up Development Environment: Install the appropriate SDK for your language (Python or JavaScript/TypeScript) -
Implement Basic Image Generation: Start with text-to-image generation using the model ID gemini-2.5-flash-image-preview
-
Add Image Editing Capabilities: Extend your implementation to handle image+text inputs for editing tasks -
Explore Advanced Features: Experiment with multi-image inputs and conversational editing sessions -
Refine Your Prompts: Apply best practices for prompt engineering to improve output quality -
Plan for Production: Implement appropriate error handling, rate limiting, and cost monitoring for production deployment
One-Page Overview
Task | Core Method | Key Parameters | Notes |
---|---|---|---|
Text-to-Image Generation | client.models.generate_content() |
contents=[text_prompt] |
Generate new images from textual descriptions |
Image Editing | client.models.generate_content() |
contents=[text_prompt, image_object] |
Modify existing images based on text instructions |
Multi-Image Processing | client.models.generate_content() |
contents=[text_prompt, img1, img2, ...] |
Combine information from multiple input images |
Conversational Editing | chat = client.chats.create() + chat.send_message() |
Sequential messages with text/images | Maintain context across multiple editing steps |
Photo Restoration | client.models.generate_content() |
contents=[restore_prompt, old_image] |
Repair and enhance damaged or historical photos |
Model Specification | N/A | model="gemini-2.5-flash-image-preview" |
Required for all API calls to Nano Banana |
Frequently Asked Questions (FAQ)
How much does it cost to use Nano Banana via API?
While prototyping in AI Studio is free, API usage costs approximately $0.039 per 1024×1024 output image based on token consumption rates.
Can I use Nano Banana to edit existing images?
Yes, the model excels at image editing when you provide both an input image and textual instructions describing the desired modifications.
Does the model maintain consistency when editing images?
Nano Banana is particularly skilled at maintaining character and content consistency from input images during editing operations.
Is multi-turn conversational editing supported?
Yes, you can use chat sessions to maintain context across multiple requests for iterative image refinement.
What are some practical applications beyond basic image generation?
Common applications include photo restoration, style transfer, virtual try-on, multi-image compositing, and converting sketches to finished images.
Who owns the rights to images generated with Nano Banana?
According to Google’s terms, you typically own the output generated through their AI services, but you should always review the current terms of service for specific details.
How should I handle the API response structure?
The response contains multipart data that may include both text and images. Your code should iterate through the parts to process each type of content appropriately.
What’s the best approach for writing effective prompts?
Be specific, provide context, use positive framing, include photographic terminology, and don’t hesitate to iterate and refine your prompts based on results.