Site icon Efficient Coder

Mobile-Use: Revolutionizing AI-Powered Mobile Automation with Natural Language Control

Mobile-Use: Let Your Phone Work for You—A Plain-English Global Guide

“Open Gmail, find the first three unread messages, and list the sender and subject line in JSON.”
Say it. Watch it happen.


1. What Exactly Is Mobile-Use?

Mobile-use is an open-source AI agent that drives your Android or iOS device with nothing more than natural language. You speak or type a request, and the program:

  • understands what you want
  • interacts with the user interface exactly like a human would
  • returns the result in the exact format you asked for—JSON, plain text, CSV, or even Markdown

No code, no macros, no complex scripting. Just words.


2. Why You Might Care

Everyday Pain Old Way Mobile-Use Way
Exporting unread emails daily Screenshot → OCR → spreadsheet One sentence → JSON file
Tracking 5 apps’ daily active users Open each app → scroll → copy numbers One sentence → consolidated table
Helping parents use smartphones Video call instructions Read the sentence aloud → phone does it

3. Core Capabilities in Plain English

  1. Natural Language Control
    Speak or type in any major language. The agent figures out the rest.

  2. UI-Aware Automation
    It sees buttons, icons, and text the same way you do, so layout changes do not break the flow.

  3. Structured Data Extraction
    Anything visible on screen can be scraped and delivered as JSON, CSV, Markdown, or plain text.

  4. Swappable AI Brain
    Use OpenAI by default or swap in Claude, Gemini, or a local model by editing a single JSON file.


4. Benchmark Snapshot

Mobile-use is #1 on the open-source pass@1 leaderboard of the AndroidWorld benchmark.
Full leaderboard: Google Sheets link


5. Quick-Start Checklist

Task Android Physical Android Emulator iOS Simulator
Enable debugging Settings → Developer Options → USB Debugging Built-in macOS + Xcode
Required tool ADB ADB Xcode
First-time connection USB cable + on-device prompt None None
Network Same Wi-Fi as computer Same subnet Localhost

6. Two Ways to Install

Route A: One-Line Docker (Beginner-Friendly)

Prerequisites

  • Docker installed
  • Physical Android device or emulator on the same Wi-Fi network as your computer

Run the Script

macOS / Linux

chmod +x mobile-use.sh
./mobile-use.sh \
  "Open Gmail, find the first three unread emails, and list their sender and subject line" \
  --output-description "A JSON list of objects, each with 'sender' and 'subject' keys"

Windows (PowerShell)

powershell.exe -ExecutionPolicy Bypass -File mobile-use.ps1 `
  "Open Gmail, find the first three unread emails, and list their sender and subject line" `
  --output-description "A JSON list of objects, each with 'sender' and 'subject' keys"

The terminal may pause and ask whether Maestro can collect anonymous usage data. Type Y or n and press Enter.

Common Hiccups

Error Meaning Quick Fix
Could not get device IP Wi-Fi interface name is unusual Run adb shell ip addr show up, find the interface, then add --interface <NAME>
Failed to connect to <DEVICE_IP>:5555 Firewall blocked the port Temporarily disable the firewall or open port 5555
unauthorized: authentication required (Docker) Old ghcr.io token docker logout ghcr.io then rerun

Route B: Manual Dev Setup (Full Control)

1. Clone

git clone https://github.com/minitap-ai/mobile-use.git
cd mobile-use

2. Environment Variables

cp .env.example .env
# Edit .env and add at least OPENAI_API_KEY

3. Virtual Environment

uv venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
uv sync

4. First Command

python ./src/mobile_use/main.py "Open Settings and tell me my current battery level"

7. Practical Walk-Throughs

Walk-Through 1: Battery Check

python ./src/mobile_use/main.py "Show me my battery percentage" \
  --output-description "Plain text percentage only"

Sample output:

85%

Walk-Through 2: Email Export

Goal: create a nightly CSV of unread Gmail.

python ./src/mobile_use/main.py \
  "Open Gmail, collect all unread emails, extract sender and subject" \
  --output-description "CSV with columns sender,subject"

Walk-Through 3: Multi-App Workflow

Scenario: Daily report that pulls yesterday’s steps from Google Fit and sleep hours from Samsung Health.

python ./src/mobile_use/main.py \
  "Open Google Fit, note yesterday’s steps; then open Samsung Health, note yesterday’s sleep hours; return JSON with keys steps,sleep_hours"

8. Swapping the AI Brain (LLM)

  1. Copy template

    cp llm-config.override.template.jsonc llm-config.override.jsonc
    
  2. Edit llm-config.override.jsonc

    • Change "provider":"openai" to "provider":"claude" or any other supported backend
    • Add the new API key
    • Save and exit—no restart required

9. Global FAQ

Q1: Does it work on a physical iPhone?
A: Not today. The README clearly states “Physical iOS devices are not yet supported.” iOS Simulator on macOS is the only option.

Q2: Is my data safe?
A: The entire codebase is MIT-licensed and open source. All processing happens on your machine unless you explicitly send data elsewhere.

Q3: How accurate is Chinese or non-English text?
A: As long as the app uses standard system fonts, recognition is high. Icons without text are handled through context reasoning.

Q4: Can I run it offline?
A: No. The agent needs to call the large language model over the internet unless you host a local LLM and point the config file to it.

Q5: Multiple commands in one go?
A: One sentence per run. For batch jobs, wrap the calls in a shell script or cron job.


10. Contributing in Three Steps

  1. Open an Issue describing the bug or feature.
  2. Fork the repo and follow the guidelines in CONTRIBUTING.md.
  3. Submit a Pull Request—the maintainers review quickly.

11. Next Moves

By now you should be able to:

  • Explain what mobile-use does in one sentence
  • Choose Docker or manual setup
  • Run your first natural-language command
  • Swap the underlying AI model
  • Contribute back to the project

Pick one repetitive task you do on your phone every day, write it as a plain-English sentence, and let mobile-use handle it. You may never tap through the same sequence again.


References

  • Project repository: https://github.com/minitap-ai/mobile-use
  • AndroidWorld benchmark leaderboard: Google Sheets

Exit mobile version