YouTube video summarizer with speaker detection, formatted documents, and audio output. Works out of the box with macOS built-in TTS. Optional recommended tools (pandoc, ffmpeg, mlx-audio) enhance quality. Requires internet for YouTube access. No paid APIs or subscriptions. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.
OpenClaw skills run inside an OpenClaw container. EasyClawd deploys and manages yours ā no server setup needed.
- Description updated: works out of the box, optional tools enhance quality, internet required for YouTube - Added OpenClaw metadata: declares summarize as required binary - Added security clarifications: no data uploaded, sub-agent has strict no-install instructions - Expanded setup.py documentation in README - Fixed display name on ClawHub: TubeScribe (was Tubescribe)
---
name: TubeScribe
description: "YouTube video summarizer with speaker detection, formatted documents, and audio output. Works out of the box with macOS built-in TTS. Optional recommended tools (pandoc, ffmpeg, mlx-audio) enhance quality. Requires internet for YouTube access. No paid APIs or subscriptions. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video."
metadata:
{
"openclaw":
{
"emoji": "š¬",
"requires": { "bins": ["summarize"] }
}
}
---
# TubeScribe š¬
**Turn any YouTube video into a polished document + audio summary.**
Drop a YouTube link ā get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.
### šø Free & No Paid APIs
- **No subscriptions or API keys** ā works out of the box
- **Local processing** ā transcription, speaker detection, and TTS run on your machine
- **Network access** ā fetching from YouTube (captions, metadata, comments) requires internet
- **No data uploaded** ā nothing is sent to external services; all processing stays on your machine
- **Safe sub-agent** ā spawned sub-agent has strict instructions: no software installation, no network calls beyond YouTube
### ⨠Features
- **š Transcript with summary and key quotes** ā Export as DOCX, HTML, or Markdown
- **šÆ Smart Speaker Detection** ā Automatically identifies participants
- **š Audio Summaries** ā Listen to key points (MP3/WAV)
- **š Clickable Timestamps** ā Every quote links directly to that moment in the video
- **š¬ YouTube Comments** ā Viewer sentiment analysis and best comments
- **š Queue Support** ā Send multiple links, they get processed in order
- **š Non-Blocking Workflow** ā Conversation continues while video processes in background
### š¬ Works With Any Video
- Interviews & podcasts (multi-speaker detection)
- Lectures & tutorials (single speaker)
- Music videos (lyrics extraction)
- News & documentaries
- Any YouTube content with captions
## Quick Start
When user sends a YouTube URL:
1. Spawn sub-agent with the full pipeline task **immediately**
2. Reply: "š¬ TubeScribe is processing ā I'll let you know when it's ready!"
3. Continue conversation (don't wait!)
4. Sub-agent notification will announce completion with title and details
**DO NOT BLOCK** ā spawn and move on instantly.
## First-Time Setup
Run setup to check dependencies and configure defaults:
```bash
python skills/tubescribe/scripts/setup.py
```
This checks: `summarize` CLI, `pandoc`, `ffmpeg`, `Kokoro TTS`
## Full Workflow (Single Sub-Agent)
Spawn ONE sub-agent that does the entire pipeline:
```python
sessions_spawn(
task=f"""
## TubeScribe: Process {youtube_url}
ā ļø CRITICAL: Do NOT install any software.
No pip, brew, curl, venv, or binary downloads.
If a tool is missing, STOP and report what's needed.
Run the COMPLETE pipeline ā do not stop until all steps are done.
### Step 1: Extract
```bash
python3 skills/tubescribe/scripts/tubescribe.py "{youtube_url}"
```
Note the **Source** and **Output** paths printed by the script. Use those exact paths in subsequent steps.
### Step 2: Read source JSON
Read the Source path from Step 1 output and note:
- metadata.title (for filename)
- metadata.video_id
- metadata.channel, upload_date, duration_string
### Step 3: Create formatted markdown
Write to the Output path from Step 1:
1. `# **<title>**`
---
2. Video info block ā Channel, Date, Duration, URL (clickable). Empty line between each field.
---
3. `## **Participants**` ā table with bold headers:
```
| **Name** | **Role** | **Description** |
|----------|----------|-----------------|
```
---
4. `## **Summary**` ā 3-5 paragraphs of prose
---
5. `## **Key Quotes**` ā 5 best with clickable YouTube timestamps. Format each as:
```
"Quote text here." - [12:34](https://www.youtube.com/watch?v=ID&t=754s)
"Another quote." - [25:10](https://www.youtube.com/watch?v=ID&t=1510s)
```
Use regular dash `-`, NOT em dash `ā`. Do NOT use blockquotes `>`. Plain paragraphs only.
---
6. `## **Viewer Sentiment**` (if comments exist)
---
7. `## **Best Comments**` (if comments exist) ā Top 5, NO lines between them:
```
Comment text here.
*- ā² 123 @AuthorName*
Next comment text here.
*- ā² 45 @AnotherAuthor*
```
Attribution line: dash + italic. Just blank line between comments, NO `---` separators.
---
8. `## **Full Transcript**` ā merge segments, speaker labels, clickable timestamps
### Step 4: Create DOCX
Clean the title for filename (remove special chars), then:
```bash
pandoc <output_path> -o ~/Documents/TubeScribe/<safe_title>.docx
```
### Step 5: Generate audio
Write the summary text to a temp file, then use TubeScribe's built-in audio generation:
```bash
# Write summary to temp file (use python3 to write, avoids shell escaping issues)
python3 -c "
text = '''YOUR SUMMARY TEXT HERE'''
with open('<temp_dir>/tubescribe_<video_id>_summary.txt', 'w') as f:
f.write(text)
"
# Generate audio (auto-detects engine, voice, format from config)
python3 skills/tubescribe/scripts/tubescribe.py \
--generate-audio <temp_dir>/tubescribe_<video_id>_summary.txt \
--audio-output ~/Documents/TubeScribe/<safe_title>_summary
```
This reads `~/.tubescribe/config.json` and uses the configured TTS engine (mlx/kokoro/builtin), voice blend, and speed automatically. Output format (mp3/wav) comes from config.
### Step 6: Cleanup
```bash
python3 skills/tubescribe/scripts/tubescribe.py --cleanup <video_id>
```
### Step 7: Open folder
```bash
open ~/Documents/TubeScribe/
```
### Report
Tell what was created: DOCX name, MP3 name + duration, video stats.
""",
label="tubescribe",
runTimeoutSeconds=900,
cleanup="delete"
)
```
**After spawning, reply immediately:**
> š¬ TubeScribe is processing - I'll let you know when it's ready!
Then continue the conversation. The sub-agent notification announces completion.
## Configuration
Config file: `~/.tubescribe/config.json`
```json
{
"output": {
"folder": "~/Documents/TubeScribe",
"open_folder_after": true,
"open_document_after": false,
"open_audio_after": false
},
"document": {
"format": "docx",
"engine": "pandoc"
},
"audio": {
"enabled": true,
"format": "mp3",
"tts_engine": "mlx"
},
"mlx_audio": {
"path": "~/.openclaw/tools/mlx-audio",
"model": "mlx-community/Kokoro-82M-bf16",
"voice": "af_heart",
"lang_code": "a",
"speed": 1.05
},
"kokoro": {
"path": "~/.openclaw/tools/kokoro",
"voice_blend": { "af_heart": 0.6, "af_sky": 0.4 },
"speed": 1.05
},
"processing": {
"subagent_timeout": 600,
"cleanup_temp_files": true
}
}
```
### Output Options
| Option | Default | Description |
|--------|---------|-------------|
| `output.folder` | `~/Documents/TubeScribe` | Where to save files |
| `output.open_folder_after` | `true` | Open output folder when done |
| `output.open_document_after` | `false` | Auto-open generated document |
| `output.open_audio_after` | `false` | Auto-open generated audio summary |
### Document Options
| Option | Default | Values | Description |
|--------|---------|--------|-------------|
| `document.format` | `docx` | `docx`, `html`, `md` | Output format |
| `document.engine` | `pandoc` | `pandoc` | Converter for DOCX (falls back to HTML) |
### Audio Options
| Option | Default | Values | Description |
|--------|---------|--------|-------------|
| `audio.enabled` | `true` | `true`, `false` | Generate audio summary |
| `audio.format` | `mp3` | `mp3`, `wav` | Audio format (mp3 needs ffmpeg) |
| `audio.tts_engine` | `mlx` | `mlx`, `kokoro`, `builtin` | TTS engine (mlx = fastest on Apple Silicon) |
### MLX-Audio Options (preferred on Apple Silicon)
| Option | Default | Description |
|--------|---------|-------------|
| `mlx_audio.path` | `~/.openclaw/tools/mlx-audio` | mlx-audio venv location |
| `mlx_audio.model` | `mlx-community/Kokoro-82M-bf16` Read full documentation on ClawHub