Prompt API: Chrome’s Built-in AI Powerhouse with Gemini Nano

What is Prompt API?

Prompt API is an experimental feature from Chrome (currently available in the Origin Trial for Chrome 138 and later versions) that allows developers to harness the power of the Gemini Nano model through API calls. This innovative tool enables processing of natural language, images, and audio inputs directly within the browser, generating text outputs. It opens up a world of possibilities for web applications, including:

AI-driven search: Answering user questions based on webpage content
Personalized content: Dynamically categorizing news articles for user filtering
Multimodal applications: Processing text, images, and audio to generate descriptions, transcriptions, or classification results

Hardware and Usage Requirements

To utilize Prompt API effectively, both developers and end-users must meet specific hardware and software requirements:

Operating System: Windows 10/11, macOS 13+ (Ventura or later), or Linux. Currently, Android, iOS, and ChromeOS are not supported.
Storage Space: At least 22GB of free storage (model size may change with updates). You can check the current size in chrome://on-device-internals. If storage drops below 10GB, the model will be automatically deleted and require redownloading.
GPU: Graphics card with more than 4GB of VRAM (video random access memory).
Network: Unlimited data connection (Wi-Fi or Ethernet) is required.

Getting Started with Prompt API

Before diving into implementation, familiarize yourself with Google’s Generative AI Use Policy. The core functionality of Prompt API revolves around two key functions in the LanguageModel namespace:

LanguageModel.availability(): Checks if the model is available, returning statuses like “available,” “needs download,” or “unavailable.”
LanguageModel.create(): Creates a session and triggers model download if necessary. Developers can enhance user experience by monitoring download progress through events like downloadprogress.

Model Download Process

While Prompt API comes built into Chrome, the Gemini Nano model itself downloads separately when the API is first used by a website. To check if the model is ready, use the asynchronous LanguageModel.availability() function, which returns one of the following statuses:

"unavailable: The implementation doesn’t support the requested options or prompt language models at all.
"downloadable": The implementation supports the requested options, but some components (like the language model or fine-tuning data) need to be downloaded first.
"downloading": The implementation supports the requested options, but an ongoing download must complete before creating a session.
"available: The implementation supports the requested options with no new downloads required.

To initiate the model download and create a language model session, use the asynchronous LanguageModel.create() function. When the availability status is "downloadable", it’s good practice to monitor download progress to keep users informed:

const session = await LanguageModel.create({
  monitor(m) {
    m.addEventListener("downloadprogress", (e) => {
      console.log(`Downloaded ${e.loaded * 100}%`);
    });
  },
});

Understanding Model Parameters

The params() function provides valuable information about the language model’s capabilities, returning an object with these fields:

defaultTopK: The default top-K value (default: 3)
maxTopK: The maximum allowed top-K value (8)
defaultTemperature: The default temperature setting (1.0)
temperature: A value between 0.0 and 2.0 that controls output randomness
maxTemperature: The highest allowed temperature value (2.0)

await LanguageModel.params();
// Returns: {defaultTopK: 3, maxTopK: 8, defaultTemperature: 1, maxTemperature: 2}

For those new to these terms, top-K controls how many candidate responses the model considers when generating output, while temperature affects randomness—lower values create more focused, predictable results, while higher values produce more varied, creative outputs.

Creating and Managing Sessions

Once Prompt API is ready, you can create sessions using the create() function. These sessions allow interaction with the model through prompt() or promptStreaming() functions.

Customizing Your Session

You can customize each session using an optional options object with topK and temperature parameters. These default to the values returned by LanguageModel.params(). Important note: When initializing a new session, you must either specify both topK and temperature or neither.

const params = await LanguageModel.params();
const slightlyHighTemperatureSession = await LanguageModel.create({
  temperature: Math.max(params.defaultTemperature * 1.2, 2.0),
  topK: params.defaultTopK,
});

The create() function also accepts a signal field in its options object, allowing you to pass an AbortSignal to terminate the session:

const controller = new AbortController();
stopButton.onclick = () => controller.abort();

const session = await LanguageModel.create({
  signal: controller.signal,
});

Using Initial Prompts

Initial prompts provide context about previous interactions, enabling features like continuing conversations after a browser restart. Here’s how to set them up:

const session = await LanguageModel.create({
  initialPrompts: [
    { role: "system", content: "You are a helpful and friendly assistant." },
    { role: "user", content: "What is the capital of Italy?" },
    { role: "assistant", content: "The capital of Italy is Rome." },
    { role: "user", content: "What language is spoken there?" },
    {
      role: "assistant",
      content: "The official language of Italy is Italian. [...]",
    },
  ],
});

Guiding Responses with Prefixes

Beyond establishing previous roles, you can add new “assistant” role messages to shape the model’s previous answers. For example:

const followup = await session.prompt([
  {
    role: "user",
    content: "I'm nervous about my presentation tomorrow",
  },
  {
    role: "assistant",
    content: "Presentations are tough!",
  },
]);

In some cases, you might want to pre-fill part of the “assistant” response to guide the model toward a specific format. Add prefix: true to the trailing “assistant” message to achieve this:

const characterSheet = await session.prompt([
  {
    role: "user",
    content: "Create a TOML character sheet for a gnome barbarian",
  },
  {
    role: "assistant",
    content: "```toml\n",
    prefix: true,
  },
]);

Appending Messages Without Prompting

Processing multimodal inputs can take time, so pre-sending planned prompts to populate the session can help the model start processing earlier. While initialPrompts works during session creation, the append() method lets you add context after creation:

const session = await LanguageModel.create({
  initialPrompts: [
    {
      role: "system",
      content:
        "You are a skilled analyst who correlates patterns across multiple images.",
    },
  ],
  expectedInputs: [{ type: "image" }],
});

fileUpload.onchange = async () => {
  await session.append([
    {
      role: "user",
      content: [
        {
          type: "text",
          value: `Here's one image. Notes: ${fileNotesInput.value}`,
        },
        { type: "image", value: fileUpload.files[0] },
      ],
    },
  ]);
};

analyzeButton.onclick = async (e) => {
  analysisResult.textContent = await session.prompt(userQuestionInput.value);
};

The promise returned by append() resolves when the prompt is validated, processed, and added to the session. It rejects if the prompt can’t be appended.

Session Limits and Quotas

Each language model session has a maximum token capacity. You can track usage and remaining capacity using these session properties:

console.log(`${session.inputUsage}/${session.inputQuota}`);

This helps prevent hitting limits during long conversations or complex tasks.

Maintaining Conversation Context

Each session tracks conversation context, considering previous interactions in future responses until the context window is full:

const session = await LanguageModel.create({
  initialPrompts: [
    {
      role: "system",
      content:
        "You are a friendly, helpful assistant specialized in clothing choices.",
    },
  ],
});

const result1 = await session.prompt(
  "What should I wear today? It is sunny. I am unsure between a t-shirt and a polo.",
);
console.log(result1);

const result2 = await session.prompt(
  "That sounds great, but oh no, it is actually going to rain! New advice?",
);
console.log(result2);

In this example, the model remembers the initial clothing question when providing updated advice for rainy weather.

Enforcing JSON Output Format

To ensure the model follows a specific JSON structure, pass a JSON schema in the options object’s responseConstraint field to prompt() or promptStreaming():

const session = await LanguageModel.create();

const schema = {
  "type": "boolean"
};

const post = "Mugs and ramen bowls, both a bit smaller than intended- but that's how it goes with reclaim. Glaze crawled the first time around, but pretty happy with it after refiring.";

const result = await session.prompt(
  `Is this post about pottery?\n\n${post}`,
  {
    responseConstraint: schema,
  }
);
console.log(JSON.parse(result));
// Returns: true

By default, the implementation may include the schema in messages to the language model, which uses part of your input quota. You can measure this usage with session.measureInputUsage() by passing the responseConstraint option.

To avoid this behavior, use the omitResponseConstraintInput option, but be sure to include formatting guidance in your prompt:

const result = await session.prompt(
  `
  Summarize this feedback into a rating between 0-5, only outputting a JSON
  object { rating }, with a single property whose value is a number:
  The food was delicious, service was excellent, will recommend.
`,
  { responseConstraint: schema, omitResponseConstraintInput: true },
);

Cloning Sessions

To conserve resources, you can clone existing sessions using the clone() function. This resets conversation context but preserves initial prompts. The function accepts an optional options object with a signal field for termination:

const controller = new AbortController();
stopButton.onclick = () => controller.abort();

const clonedSession = await session.clone({
  signal: controller.signal,
});

Cloning is useful when you want to explore different directions in a conversation without starting from scratch.

Interacting with the Model

You can prompt the model using either prompt() for complete responses or promptStreaming() for incremental results.

Non-Streaming Output

For shorter responses, use prompt(), which returns the full result once generated:

// First check if a session can be created based on model availability and device capabilities
const { defaultTemperature, maxTemperature, defaultTopK, maxTopK } =
  await LanguageModel.params();

const available = await LanguageModel.availability();

if (available !== "unavailable") {
  const session = await LanguageModel.create();

  // Prompt the model and wait for the complete result
  const result = await session.prompt("Write me a poem!");
  console.log(result);
}

Streaming Output

For longer responses, promptStreaming() provides a ReadableStream that delivers partial results as they’re generated:

const { defaultTemperature, maxTemperature, defaultTopK, maxTopK } =
  await LanguageModel.params();

const available = await LanguageModel.availability();
if (available !== "unavailable") {
  const session = await LanguageModel.create();

  // Prompt the model and stream the result
  const stream = session.promptStreaming("Write me an extra-long poem!");
  for await (const chunk of stream) {
    console.log(chunk);
  }
}

Streaming creates a more responsive user experience for lengthier outputs like articles, stories, or detailed explanations.

Stopping Prompts

Both prompt() and promptStreaming() accept an optional second parameter with a signal field, allowing you to stop processing:

const controller = new AbortController();
stopButton.onclick = () => controller.abort();

const result = await session.prompt("Write me a poem!", {
  signal: controller.signal,
});

This is particularly useful for implementing user-initiated cancellation of long-running requests.

Terminating Sessions

When you no longer need a session, call destroy() to free resources. Destroyed sessions can’t be reused, and any ongoing operations will be aborted:

await session.prompt(
  "You are a friendly, helpful assistant specialized in clothing choices."
);

session.destroy();

// This promise will reject with an error indicating the session is destroyed
await session.prompt(
  "What should I wear today? It is sunny, and I am unsure between a t-shirt and a polo."
);

It’s good practice to destroy sessions when they’re no longer needed, especially in single-page applications that might remain open for extended periods.

Multimodal Capabilities

Starting with Chrome 138 Canary, Prompt API supports audio and image inputs for local experimentation, with text output. These capabilities enable exciting new features:

Transcribing audio messages in chat applications
Generating descriptions for uploaded images to use in captions or alt text

const session = await LanguageModel.create({
  // { type: "text" } is optional unless specifying expected input languages
  expectedInputs: [{ type: "audio" }, { type: "image" }],
});

const referenceImage = await (await fetch("/reference-image.jpeg")).blob();
const userDrawnImage = document.querySelector("canvas");

const response1 = await session.prompt([
  {
    role: "user",
    content: [
      {
        type: "text",
        value:
          "Give a helpful artistic critique of how well the second image matches the first:",
      },
      { type: "image", value: referenceImage },
      { type: "image", value: userDrawnImage },
    ],
  },
]);

console.log(response1);

const audioBlob = await captureMicrophoneInput({ seconds: 10 });

const response2 = await session.prompt([
  {
    role: "user",
    content: [
      { type: "text", value: "My response to your critique:" },
      { type: "audio", value: audioBlob },
    ],
  },
]);

Multimodal Demonstrations

For practical examples of Prompt API with audio input, check out the Mediarecorder Audio Prompt demo. For image input examples, see the Canvas Image Prompt demo.

Performance Best Practices

Prompt API for the web is still under development. For optimal performance, follow these best practices for session management:

Reuse sessions when possible rather than creating new ones for each interaction
Monitor input usage to avoid hitting quota limits
Implement proper session destruction when conversations end
Use streaming for longer responses to improve perceived performance
Provide clear user feedback during model downloads and processing

Application Scenarios

Prompt API enables a wide range of practical applications across different content types:

Text Applications

Summarizing hotel reviews
Generating structured data like star ratings
Creating product descriptions from specifications
Answering questions based on webpage content
Categorizing news articles for personalized feeds

Image Applications

Classifying images (e.g., detecting identification documents)
Generating alt text for accessibility
Comparing product images for similarities
Analyzing visual content for specific features
Creating captions for photos

Audio Applications

Transcribing audio messages in encrypted chats
Filtering live recordings in music collections
Converting voice notes to text
Analyzing audio content for specific patterns
Generating descriptions of audio clips

Important Considerations

Permission Policies

By default, only top-level windows and same-origin iframes can use Prompt API. Cross-origin iframes require the allow="language-model" attribute.

Web Workers Limitation

Currently, Prompt API doesn’t support Web Workers and must run in the main document or an iframe.

Privacy and Security

Always adhere to Google’s AI usage policies and ensure user data remains secure. Since processing happens locally in the browser, sensitive information doesn’t leave the user’s device, but you should still implement appropriate data handling practices.

Storage Management

The model may be automatically deleted if storage space drops below 10GB, requiring redownload when space becomes available. Inform users about this possibility to manage expectations.

Providing Feedback

Your feedback helps shape the future of Prompt API and Gemini Nano. Here’s how you can contribute:

Join the Early Access Program
Submit bug reports or feature requests for Chrome’s implementation
Share feedback on the API structure by commenting on existing issues or opening new ones in the Prompt API GitHub repository
Participate in standardization efforts through the Web Incubator Community Group

Your input directly influences the development of this API and all built-in AI APIs, potentially leading to specialized task APIs for specific use cases like audio transcription or image description.

As Prompt API continues to evolve, it promises to unlock new possibilities for web developers, bringing powerful AI capabilities directly to browsers while maintaining user privacy through local processing. By integrating these tools, developers can create more intelligent, responsive, and accessible web applications that work seamlessly across devices.

Chrome Prompt API: Revolutionizing On-Device AI with Gemini Nano Integration