Give Every Post a Voice: A Step-by-Step Guide to bskyScribe, the Open-Source Media-Description Bot for Bluesky
Imagine scrolling Bluesky on the train.
You see a 45-second video, but the creator left no caption.
A friend shares an infographic, yet the text is too small to read.
For users with low vision, hearing loss, or simply a broken headphone jack, these posts are locked doors.
bskyScribe is a small, friendly key.
It waits in the background, listens for a mention, and then automatically writes a short, human-readable summary—under 250 characters—so that everyone can join the conversation.
This guide walks you through the entire journey: cloning the code, getting your free API keys, testing on one post, and finally letting the bot run 24/7 in the cloud.
No prior AI experience is required; if you can open a terminal and run pip install
, you are ready.
1. Why a “media-description bot” matters
Situation | Without description | With bskyScribe |
---|---|---|
Blind or low-vision user | “There’s an image here” | “Three corgis chase a frisbee across a sunny lawn.” |
Commuter on mute | Video autoplays silently | Comment reads: “Speaker demos a silent red keyboard switch for 10 s.” |
Multilingual community | Japanese chart is unreadable | Auto-summary: “Chart shows 7 % phone shipment growth in 2024.” |
bskyScribe does not try to dazzle; it equalizes access, one reply at a time.
Behind the scenes, Google Gemini supplies the brain, the AT Protocol supplies the ears and mouth, and a few dozen lines of Python keep everything polite and fast.
2. The 30-second overview
-
Listen – subscribes to Bluesky notifications. -
Fetch – downloads any image, audio, or video into memory (no disk clutter). -
Analyze – asks Gemini: What’s going on here? -
Trim – compresses the answer to < 250 characters. -
Reply – posts the summary as a threaded reply and logs the event.
3. What you need before you start
Item | How to get it |
---|---|
Python 3.9 or newer | 👉python.org or Anaconda |
Google Gemini API key | 👉Google AI Studio – free tier |
Bluesky account + app password | 👉Bluesky Settings → App Passwords |
Any computer with Internet | Local laptop, Raspberry Pi, or cloud VM |
Tip: An app password is not your login password. It is a 16-character string you can revoke at any time, perfect for bots.
4. Installation in three commands
4.1 Clone the repository
git clone https://github.com/your-org/bskyScribe.git
cd bskyScribe
4.2 Create an isolated environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
Why isolate?
Think of it as giving the project its own bookshelf—no mixed-up libraries when you upgrade something else.
4.3 Install dependencies
pip install -r requirements.txt
A long list of “Successfully installed …” means you are ready to configure.
5. Two-minute setup: keys and passwords
5.1 Copy the template
cp .env.example .env
5.2 Edit the file
GEMINI_API_KEY=your_40_char_key_here
BLUESKY_USERNAME=your_handle.bsky.social
BLUESKY_PASSWORD=your_16_char_app_password
Keep secrets out of source code;
.env
is ignored by git for safety.
6. Two ways to run the bot
6.1 One-off transcription (great for testing)
from bots.transcriptionBot import MediaProcessingBot
bot = MediaProcessingBot() # reads .env automatically
post_url = "https://bsky.app/profile/naomi.dev/post/3jx..."
result = bot.transcribe_post(post_url) # takes 3–15 s locally
print(bot.format_transcription_reply(result))
Typical terminal output:
{"response": "A latte with leaf art sits next to an open notebook in soft morning light."}
Ready? Post it:
bot.post_transcription_reply(post_url)
6.2 Always-on daemon (public service)
from daemon import Scribe
scribe = Scribe()
scribe.monitor_mentions() # runs forever, Ctrl+C to stop
7. Reading the JSON response
Field | Example | Purpose |
---|---|---|
thinking | “Audio is clear English, 12 s, topic keyboard” | Debug log, can be hidden |
request_type | SUMMARIZE | Helps front-end pick an icon |
media_type | VIDEO | Affects TTS voice choice |
response_character_count | 245 | Safety check |
response | “Speaker demos silent red switch…” | The text the user sees |
8. Project anatomy: open the lid
bskyScribe/
├── clients/
│ ├── bluesky.py # Talks to Bluesky (AT Protocol)
│ └── gemini.py # Talks to Google Gemini
├── bots/
│ └── transcriptionBot.py # Glues the two together
├── prompt/
│ └── prompt.txt # Plain-text instructions for Gemini
├── daemon.py # 24/7 background loop
├── render.yaml # Cloud deployment recipe
├── Procfile # Heroku/Render start command
├── requirements.txt # Python packages
└── .env # Your private keys
-
bluesky.py uses subscribeRepos
for notifications andgetPostThread
to fetch media. -
gemini.py converts images to base64, uploads audio to a short-lived URL, then POSTs to https://generativelanguage.googleapis.com/v1beta/models/gemini-pro-vision:generateContent
. -
transcriptionBot.py decides describe vs. read text vs. summarize speech. -
High-density text in an image → OCR -
Otherwise → visual description -
Audio or video → transcript → summary
-
9. The prompt: teaching the AI to be brief
prompt.txt is only ten lines, yet it shapes the tone:
- Answer in English, max 250 characters.
- One-sentence overview, then 2–3 key details.
- If text dominates the image, transcribe it.
- Do not guess personal names.
- Replace graphic violence with “Content may be disturbing.”
Tip: Explicit character limits work better than “be concise.” The model self-truncates.
10. Deployment: from laptop to cloud in ten minutes
10.1 Render (free tier)
-
Log in with GitHub at 👉render.com. -
New → Background Worker → pick your repo. -
Paste your .env
lines into the “Environment” tab. -
Deploy. First build takes 2–3 min; later pushes are automatic.
10.2 Watch the logs
In the Render dashboard:
INFO:root:👂 Mention detected from @naomi.dev
INFO:root:✅ Reply posted, length 238 chars
That is your heartbeat.
11. Cost and performance snapshot
Resource | Free quota | Typical usage |
---|---|---|
Render Worker | 512 MB RAM | ~90 MB |
Google Gemini | 60 requests/min | 1 per post |
Outbound traffic | 100 GB/mo | <5 MB per media file |
Even at 500 posts per day you stay inside zero-cost limits. If traffic spikes, Render queues jobs instead of dropping them.
12. Frequently asked questions
Q1: Will the bot read my private messages?
A: No. It only touches public posts and only when explicitly mentioned. The source code is open for review.
Q2: What if the audio contains sensitive personal details?
A: Files are held in memory, never stored. Add “ignore personal names” to prompt.txt if desired.
Q3: Can it reply in other languages?
A: Replace “Answer in English” with “Answer in the user’s language” in prompt.txt. Gemini auto-detects.
13. Customization: make the bot yours
-
Signature line – append “— via bskyScribe ✨” in format_transcription_reply
. -
NSFW filter – check returned text for disallowed words before posting. -
Backfill old posts – loop through your 2023 archive and add descriptions retroactively.
14. Contributing: from user to co-maintainer
-
Fork the repo. -
Create a branch feature/emoji-support
. -
Run python -m pytest tests/
to confirm nothing breaks. -
Open a pull request titled Add emoji support for video summaries (#42)
. -
A maintainer will review within 48 h and credit you in the release notes.
15. Closing thought: information should never be silent
bskyScribe is not the flashiest AI project on GitHub, but it may be the kindest.
It takes a multimodal model that costs fractions of a cent and turns it into a 250-character note slipped under every locked door.
If you want your own posts to speak up—today—install the code, add your keys, and type:
@bskyscribe.bsky.social describe this
You will see, perhaps for the first time, how gentle technology can be.
Accessibility is not a feature; it is the path that lets everyone walk in the sun.