Nano Banana Unlocked: Build Cutting-Edge Image Generation Apps

高效码农

4 months ago

How to Build with Nano Banana: The Complete Developer Guide

Google recently released Gemini 2.5 Flash Image, a powerful new model for image generation and editing, also known by its codename, Nano Banana. This model introduces state-of-the-art capabilities for creating and manipulating images, unlocking a wide range of new applications for developers.

This comprehensive guide provides everything you need to integrate Gemini 2.5 Flash Image (Nano Banana) into your applications using the Gemini Developer API. Whether you’re looking to add creative image generation to your product or need to automate image editing workflows, this tutorial will walk you through every step of the implementation process.

The Core Question This Article Answers

How can developers leverage the Gemini 2.5 Flash Image (Nano Banana) multimodal model through API integration to implement advanced image generation, editing, and restoration capabilities in their applications?

Getting Started with Nano Banana in Google AI Studio

What’s the best way for developers to experiment with Nano Banana without writing any code? Google AI Studio provides the ideal environment for prototyping and testing prompts before implementation.

While end-users can access Nano Banana through the Gemini app, developers will find Google AI Studio to be the perfect playground for experimenting with all available AI models. This web-based environment allows you to test prompts and explore capabilities without writing any code, serving as the entry point for building with the Gemini API.

You can use Nano Banana free of charge within AI Studio. To get started, visit aistudio.google.com, sign in with your Google account, and select Nano Banana from the model picker. For direct access, use the direct link ai.studio/banana to begin a new session with the model immediately.

Google AI Studio interface showing model selection with Nano Banana selected

Beyond basic prototyping, you can also build low-code Nano Banana web applications directly in AI Studio at ai.studio/apps, or explore and remix existing applications like the PixShop app.

Author’s Reflection: The availability of a risk-free sandbox environment like AI Studio significantly lowers the barrier to entry for developers. Spending time here to conceptualize and test ideas before writing code is crucial for avoiding rework later in the development process. This approach mirrors best practices in software development where prototyping precedes implementation.

Project Setup and Configuration

What are the essential prerequisites for starting to use Nano Banana through its API? You need an API key, billing setup, and the appropriate SDK for your programming language.

To follow this guide and integrate Nano Banana into your applications, you’ll need to complete three essential setup steps: obtaining an API key, configuring billing for your project, and installing the appropriate software development kit.

Generating Your API Key

The process for obtaining your API credentials is straightforward:

Navigate to Google AI Studio and click Get API key in the left navigation panel
On the subsequent page, click Create API key
Select an existing Google Cloud project or create a new one—this project will manage billing for your API usage

Once completed, your API key will be displayed. Make sure to copy and store it securely, as you’ll need it to authenticate all API requests.

Setting Up Billing

While prototyping in AI Studio is free, using the model via the API is a paid service that requires billing setup. In the API key management screen, click Set up billing next to your project and follow the on-screen instructions to configure payment.

Billing setup prompt in Google AI Studio interface

Understanding Nano Banana Pricing

Image generation with Nano Banana costs $0.039 p er ima g e ba se d o n t h eo ff i c ia lp r i c in g o f$ 0.30 per 1M input tokens and $30 p er 1 M o u tp u tt o k e n s . A s t an d a r d 1024 x 1024 p x o u tp u t ima g eco n s u m es a pp ro x ima t e l y 1290 t o k e n s, w hi c h t r an s l a t es t o ab o u t 25 ima g es f or$ 1. Always refer to the official Gemini 2.5 Flash Image pricing table for the most current information.

Installing the SDK

Choose the appropriate SDK for your programming language environment:

Python Installation:

pip install -U google-genai
# Install the Pillow library for image manipulation
pip install Pillow

JavaScript/TypeScript Installation:

npm install @google/genai

The following examples use the Python SDK for demonstration, but equivalent JavaScript/TypeScript code snippets are available through the provided GitHub Gist.

Author’s Reflection: Clear preparation is foundational to project success. The billing setup step is particularly important—understanding the cost structure upfront helps with budget planning and prevents unexpected expenses down the line. I’ve found that projects with clear financial boundaries from the beginning tend to have more sustainable implementation plans.

Image Generation from Text Prompts

How can developers create entirely new images from simple text descriptions using Nano Banana? By calling the API with a descriptive prompt and the correct model ID.

The most fundamental capability of Nano Banana is generating one or more images from descriptive text prompts. This functionality opens up numerous applications from blog post illustration generation to UI placeholder creation and marketing visual production.

To use this feature, you’ll need to use the model ID gemini-2.5-flash-image-preview for all API requests. The response structure is multimodal, meaning it can contain both text and image data that you’ll need to handle appropriately in your code.

Application Scenario: Imagine you’re building a content management system that automatically generates featured images for blog posts. Instead of relying on stock photography or manual design, you could use Nano Banana to create unique, relevant imagery based on the article’s content.

from google import genai
from PIL import Image
from io import BytesIO

# Configure the client with your API key
client = genai.Client(api_key="YOUR_API_KEY")

prompt = """Create a photorealistic image of an orange cat
with green eyes, sitting on a couch."""

# Call the API to generate content
response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents=prompt,
)

# The response can contain both text and image data.
# Iterate through the parts to find and save the image.
for part in response.candidates[0].content.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = Image.open(BytesIO(part.inline_data.data))
        image.save("cat.png")

Output:

A photorealistic image of an orange cat with green eyes sitting on a couch

The multimodal response structure means you receive a list of parts that can contain interleaved text and image data (inline_data). The code above demonstrates how to iterate through these parts to extract and save the generated image while also handling any textual response that might be included.

Image Editing with Text and Image Inputs

How can developers modify existing images based on textual instructions? By providing both an image and text prompt to the API.

Nano Banana excels at maintaining character and content consistency when editing existing images, making it ideal for tasks that require modifying specific elements while preserving overall composition and style.

Application Scenario: Consider an e-commerce platform that needs to showcase products in different environments. Instead of manually photoshopping each product into various scenes, developers could build an automated system that places products into contextually appropriate backgrounds based on simple text instructions.

from google import genai
from PIL import Image
from io import BytesIO

client = genai.Client(api_key="YOUR_API_KEY")

prompt = """Using the image of the cat, create a photorealistic,
street-level view of the cat walking along a sidewalk in a
New York City neighborhood, with the blurred legs of pedestrians
and yellow cabs passing by in the background."""

image = Image.open("cat.png")

# Pass both the text prompt and the image in the 'contents' list
response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents=[prompt, image],
)

for part in response.candidates[0].content.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = Image.open(BytesIO(part.inline_data.data))
        image.save("cat2.png")

Input and Output:

Side-by-side comparison showing original cat image and edited image of cat walking in New York City

Author’s Reflection: The model’s ability to maintain subject consistency while completely transforming the environment is particularly impressive. This isn’t simple image compositing—it’s a semantic understanding of the prompt that recontextualizes the subject appropriately. This capability opens up possibilities for coherent visual storytelling that maintains character continuity across different scenes.

Photo Restoration with Nano Banana

Can Nano Banana help restore and enhance old or damaged photographs? Yes, with simple prompts, it can effectively restore and colorize historical images.

One of the most powerful applications of Nano Banana is photo restoration. The model can breathe new life into old photographs by repairing damage, enhancing details, and adding realistic colorization based on its training data.

Application Scenario: Historical archives, museums, and genealogy services could use this technology to preserve and enhance historical images. Family historians could restore damaged family photos, and media companies could colorize historical footage for documentaries.

from google import genai
from PIL import Image
from io import BytesIO

client = genai.Client(api_key="YOUR_API_KEY")

prompt = "Restore and colorize this image from 1932"

image = Image.open("lunch.jpg")  # "Lunch atop a Skyscraper, 1932"

response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents=[prompt, image],
)

for part in response.candidates[0].content.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = Image.open(BytesIO(part.inline_data.data))
        image.save("lunch-restored.png")

Original and Output:

Side-by-side comparison showing original black-and-white historical photo and restored colorized version

The restoration process demonstrates the model’s understanding of historical context, materials, and plausible coloring based on the time period. Unlike simple filter-based colorization, Nano Banana uses semantic understanding to make educated decisions about appropriate colors for different elements in the image.

Working with Multiple Input Images

Can Nano Banana process multiple images together to accomplish more complex tasks? Yes, providing multiple images enables sophisticated editing workflows.

For more complex editing tasks, you can provide multiple images as input to the model. This capability enables advanced applications like virtual try-on, style transfer, and multi-element compositing.

Application Scenario: Fashion retailers could implement virtual try-on features where customers upload a photo of themselves and see how clothing items would look without physically trying them on. Design tools could allow users to combine elements from multiple reference images into a cohesive composition.

from google import genai
from PIL import Image
from io import BytesIO

client = genai.Client(api_key="YOUR_API_KEY")

prompt = "Make the girl wear this t-shirt. Leave the background unchanged."

image1 = Image.open("girl.png")
image2 = Image.open("tshirt.png")

response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents=[prompt, image1, image2],
)

for part in response.candidates[0].content.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = Image.open(BytesIO(part.inline_data.data))
        image.save("girl-with-tshirt.png")

Inputs 1 & 2 and Output:

Input photo of a girl, input photo of a t-shirt, and final output with girl wearing the t-shirt

The ability to process multiple images demonstrates Nano Banana’s sophisticated understanding of spatial relationships, materials, and how different elements should interact when combined. The model doesn’t just overlay images—it understands how clothing should naturally drape and fit on a human body.

Conversational Image Editing

Can developers engage in multi-turn conversations to iteratively refine images? Yes, using the chats session functionality maintains context across requests.

For iterative refinement of images, Nano Banana supports conversational editing through chat sessions that maintain context across multiple requests. This allows for a more natural, interactive editing process similar to working with a human designer.

Application Scenario: Design collaboration tools could implement a conversational interface where stakeholders can gradually refine visual concepts through natural language instructions. Content creators could iteratively adjust images until they match their vision without needing to start from scratch each time.

from google import genai
from PIL import Image
from io import BytesIO

client = genai.Client(api_key="YOUR_API_KEY")

# Create a chat session
chat = client.chats.create(
    model="gemini-2.5-flash-image-preview"
)

# Make the first image edit
response1 = chat.send_message(
    [
        "Change the cat to a bengal cat, leave everything else the same",
        Image.open("cat.png"),
    ]
)
# display / save image...

# Continue chatting and editing
response2 = chat.send_message("The cat should wear a funny party hat")
# display / save image...

Input and Outputs 1 & 2:

Original cat, first edit changing to Bengal cat, second edit adding party hat

It’s important to note that after many conversational edits, image features may begin to degrade or “drift” from the original. When this happens, it’s best practice to start a new session with the latest image and a more detailed, consolidated prompt to maintain high fidelity.

Author’s Reflection: The conversational interface transforms image editing from a one-off instruction execution to an interactive exploration process. This approach more closely mirrors how human creators work, but with potentially far greater efficiency. The “drift” phenomenon observed in extended sessions suggests that complex modifications might be better handled through carefully crafted one-time prompts or by refreshing sessions at key points to maintain quality.

Best Practices and Prompting Tips for Nano Banana

What techniques produce the best results when prompting Nano Banana? Specificity, context, iterative refinement, and positive framing significantly improve output quality.

Achieving optimal results with Nano Banana requires thoughtful prompt construction. Based on the model’s capabilities and limitations, several best practices have emerged for creating effective prompts that produce the desired outputs.

Be Hyper-Specific: The more detail you provide about subjects, colors, lighting, and composition, the more control you have over the output. Instead of “a dog,” try “a golden retriever puppy with fluffy fur, sitting in a sunbeam near a window, with soft morning light creating gentle shadows.”

Provide Context and Intent: Explain the purpose or desired mood of the image. The model’s understanding of context will influence its creative choices. For example, “create a serene landscape for a meditation app background” provides more guidance than simply “a peaceful landscape.”

Iterate and Refine: Don’t expect perfection on the first try. Use the model’s conversational ability to make incremental changes and refine your image through multiple generations.

Use Step-by-Step Instructions: For complex scenes, break your prompt into a series of clear, sequential instructions rather than a single complex sentence. This approach often yields more coherent results.

Use Positive Framing: Instead of negative prompts like “no cars,” describe the desired scene positively: “an empty, deserted street with no signs of traffic.” The model generally responds better to instructions about what to include rather than what to exclude.

Control the Camera: Use photographic and cinematic terms to direct the composition, such as “wide-angle shot,” “macro shot,” “low-angle perspective,” or “shallow depth of field with bokeh background.”

Author’s Reflection: Crafting effective prompts is more art than science. It requires developers to clearly conceptualize the final image and translate that vision into machine-understandable instructions. The most effective prompts often strike a balance between specificity and creative freedom—providing clear constraints while leaving room for the model’s imagination to enhance the result.

Community Examples and Inspiration

What are other developers building with Nano Banana? The community is exploring diverse applications from perspective shifting to 3D model generation.

The developer community has rapidly embraced Nano Banana, creating innovative applications across various domains. Exploring these examples can provide inspiration for your own projects and demonstrate the model’s versatile capabilities.

✦ Shifting camera perspective by @henrydaubrez demonstrates how to alter photographic viewpoints while maintaining scene consistency
✦ Few-shot learning for consistent character design by @multimodalart shows techniques for maintaining character consistency across multiple generated images
✦ “What does the red arrow see” Google Maps transforms by @tokumin explores creative transformations of map data into realistic scenes
✦ Generating images from stick figure annotations by @yachimat_manga illustrates how simple drawings can be converted into detailed images
✦ Creating 3D models from still images by @deedydas demonstrates potential bridges between 2D imagery and 3D assets
✦ Generating location-based AR experiences by @bilawalsidhu explores how Nano Banana can enhance augmented reality applications
✦ Converting 2D maps into 3D graphics by @demishassabis shows sophisticated spatial understanding and translation capabilities

These examples represent just a fraction of the innovative applications being developed with Nano Banana. The common thread across these projects is creative experimentation with the model’s capabilities to solve unique problems or create novel experiences.

Resources and Next Steps

Where should developers go after mastering the basics of Nano Banana? Official documentation, advanced prompting guides, and example applications provide pathways for deeper learning.

This guide has covered the fundamental building blocks for working with Gemini 2.5 Flash Image (Nano Banana). You’ve learned how to set up your development environment, generate and edit images, and apply advanced techniques like multi-image processing and conversational editing.

To continue your learning journey, explore these official resources:

✦ The Google AI Studio for continued experimentation and prototyping
✦ The Gemini API documentation for comprehensive technical reference
✦ The Nano Banana image generation guide for specialized guidance
✦ The prompting best practices article for advanced techniques
✦ The PixShop app in AI Studio for a practical implementation example

As you build with Nano Banana, remember that the most successful applications often emerge from identifying specific user needs that align with the model’s capabilities rather than seeking generic use cases. The technology provides powerful tools, but their value is realized through thoughtful implementation focused on solving real problems.

Action Checklist / Implementation Steps

Experiment in AI Studio: Begin with free prototyping at ai.studio/banana to understand capabilities
Secure API Access: Obtain your API key from AI Studio and set up billing on your Google Cloud project
Set Up Development Environment: Install the appropriate SDK for your language (Python or JavaScript/TypeScript)
Implement Basic Image Generation: Start with text-to-image generation using the model ID gemini-2.5-flash-image-preview
Add Image Editing Capabilities: Extend your implementation to handle image+text inputs for editing tasks
Explore Advanced Features: Experiment with multi-image inputs and conversational editing sessions
Refine Your Prompts: Apply best practices for prompt engineering to improve output quality
Plan for Production: Implement appropriate error handling, rate limiting, and cost monitoring for production deployment

One-Page Overview

Task	Core Method	Key Parameters	Notes
Text-to-Image Generation	`client.models.generate_content()`	`contents=[text_prompt]`	Generate new images from textual descriptions
Image Editing	`client.models.generate_content()`	`contents=[text_prompt, image_object]`	Modify existing images based on text instructions
Multi-Image Processing	`client.models.generate_content()`	`contents=[text_prompt, img1, img2, ...]`	Combine information from multiple input images
Conversational Editing	`chat = client.chats.create()` + `chat.send_message()`	Sequential messages with text/images	Maintain context across multiple editing steps
Photo Restoration	`client.models.generate_content()`	`contents=[restore_prompt, old_image]`	Repair and enhance damaged or historical photos
Model Specification	N/A	`model="gemini-2.5-flash-image-preview"`	Required for all API calls to Nano Banana

Frequently Asked Questions (FAQ)

How much does it cost to use Nano Banana via API?
While prototyping in AI Studio is free, API usage costs approximately $0.039 per 1024×1024 output image based on token consumption rates.

Can I use Nano Banana to edit existing images?
Yes, the model excels at image editing when you provide both an input image and textual instructions describing the desired modifications.

Does the model maintain consistency when editing images?
Nano Banana is particularly skilled at maintaining character and content consistency from input images during editing operations.

Is multi-turn conversational editing supported?
Yes, you can use chat sessions to maintain context across multiple requests for iterative image refinement.

What are some practical applications beyond basic image generation?
Common applications include photo restoration, style transfer, virtual try-on, multi-image compositing, and converting sketches to finished images.

Who owns the rights to images generated with Nano Banana?
According to Google’s terms, you typically own the output generated through their AI services, but you should always review the current terms of service for specific details.

How should I handle the API response structure?
The response contains multipart data that may include both text and images. Your code should iterate through the parts to process each type of content appropriately.

What’s the best approach for writing effective prompts?
Be specific, provide context, use positive framing, include photographic terminology, and don’t hesitate to iterate and refine your prompts based on results.