Free Build Guide

Build Your Own Content Second Brain

Connect every script, blog, YouTube video, and Instagram reel you've ever made into one searchable Obsidian vault — and then wire it into your AI scripting agent so every new hook checks what you've already said.

Build Your Own Content Second Brain

A note before you start

If you have made more than fifty pieces of content, you have already forgotten what you said. Your old scripts live in Google Docs. Your blog posts live in a folder you barely open. Your YouTube scripts are in a different folder again. And the words you actually said on camera in your last hundred reels are nowhere on your computer at all.

That is fine when you have ten pieces. It is a problem when you have five hundred. You start writing the same hook for the third time. You forget that you already wrote a great blog on the exact topic you are about to film. Your past work, the thing that should be your biggest advantage as a creator, is invisible to you.

This guide fixes that. You build a content second brain: one searchable Obsidian vault that holds every published thing you own, with rich metadata so you can slice it by tag, topic, brand, or pillar. The vault is derived from the sources it pulls from — Google Docs stay the source of truth, the vault is the index. And once everything is in one place, you can wire it into your scripting agent so every new hook gets generated knowing what you have already said.

The build is one sitting. The desktop tools (Obsidian, Google Drive, Node) are free; the part that does the work is an AI coding agent like Claude Code or Codex, which you should plan to run on the $100/mo tier of either Claude Max or ChatGPT Pro — the cheaper plans throttle in the middle of a long agent session. Total bill: $100/mo flat, plus a few one-time dollars in scraping credits if you transcribe Instagram. That is the kind of build anyone with a terminal can do.

1. What you're building

What it is. A single local folder, mirrored to Google Drive, that you open in Obsidian. Inside it: one markdown file per piece of content you have ever made. Each file has a YAML frontmatter block with rich metadata (CID, title, tags, brand, source URL, publish date) and the body of the content underneath.

The end state, concretely.

  1. You hit Cmd+O in Obsidian and type "executive presence." Every script, blog, and YouTube video you have ever made on the topic appears in the search results, sorted by relevance.
  2. You click a tag in any note's properties panel — say #public-speaking — and you see every note that shares it. Sixty results across short-form, long-form, and written.
  3. You drop a fresh voice memo idea into the vault. Obsidian's graph view (or a plugin like Smart Connections) surfaces the five most related notes you have already made on it, so you can either skip the topic, expand it, or remix two old ideas into a new one.
  4. Your scripting agent — whether it is a Hermes agent like the one I wrote about in the Telegram Scripting Agent guide, or a Claude Code skill — reads the vault before generating hooks for a new idea. It tells the model "this creator has already shipped these twelve things on this topic. Find a fresh angle."

What it is not. A CMS. A second source of truth. A place you edit content. Anything you change in the vault is overwritten on the next sync. The vault is a read surface. The source of truth stays wherever the content was born — Google Docs, your blog repo, your YouTube script folder, your Apify-scraped IG transcripts.

2. The architecture in plain English

There are exactly three pieces.

  1. Your sources. Wherever your content already lives. For me that is a Google Sheet for short-form scripts (one row per script, with a Google Doc URL), a content directory in my Astro site for blog posts, and a "Done" folder for YouTube scripts. For Instagram and YouTube transcripts, the sources are external (the platform itself), and I pull them with a scraper.
  2. The vault folder. A normal folder on your Mac, inside the Google Drive mirror, that Obsidian opens. Markdown files live here. One per piece of content. The folder is layered:
Content Second Brain/
  scripts/        ← one .md per script (CID-titled)
  blogs/          ← one .md per blog post (slug-titled)
  youtube/        ← one .md per finished YouTube script
  transcripts/
    yt/           ← .vtt or .txt per video (auto-caption or Whisper)
    ig/           ← .md per reel (Scrape Creators or Apify + Whisper)
  _attachments/   ← any images embedded in notes
  1. The backfill scripts. Three tiny Node scripts (covered in section 6) that walk each source, pull the body, and write a markdown file with frontmatter into the right vault subfolder. They are run-once. New content lands in the vault automatically if you set up a watcher (covered in section 9), or by re-running the same script.
Your existing content                 →  The vault                             →  How you use it
─────────────────────                    ─────────────                            ─────────────
Google Docs (scripts)            ──┐
Astro / Hugo / WordPress         ──┤    Obsidian opens a local folder            Search (Cmd+O)
content folder (blogs)           ──┼──→ that's mirrored to Google Drive,    ──→  Tag drill-down
YouTube script folder            ──┤    so every note has rich metadata          Graph view
IG / YT video transcripts        ──┘    + a backlink to its source              Cross-pollination
                                                                                  Wire into agent

Two things make this design work, and they are worth holding onto:

  • The vault is derived, not authoritative. You never edit a vault note as if it were the source. If you want to change a script, change the Google Doc and re-sync. This rule is what lets you re-run the backfill any time without fear, and it is what lets the vault stay a clean reflection instead of a third place ideas drift to.
  • Frontmatter is load-bearing. The body of a note is just the content. The frontmatter is what makes the vault queryable. Every search, every Dataview query, every "find related" plugin reads frontmatter first. Skimp on it now and you can not retrofit later without a re-sync. Section 5 is where you commit to a schema.

3. Prerequisites and tools

A Mac (or Windows / Linux box)

I built this on macOS. The walkthrough below uses Mac paths. Everything works on Windows and Linux too; substitute the Google Drive mount path and the Homebrew commands for your platform.

Obsidian

The reader. Free for personal use. Reads any folder of markdown files; renders frontmatter as a properties panel; gives you a graph view, backlinks, tags, and full-text search out of the box. No account required for the local version. You only pay if you want Obsidian Sync for mobile ($10/mo); for desktop-only, free.

Google Drive for desktop

Mirrors a Google Drive folder to your Mac. Free. The vault lives inside the mirror, so it is automatically backed up to your Drive and synced across any Mac you sign in on. You will install this once and never touch it again.

Node.js (free)

The backfill scripts are written in Node. If you do not have it, install via Homebrew: brew install node. The scripts use only Node's standard library plus one shell helper for Google Sheets reads.

An AI coding agent to orchestrate the build — Claude Code or Codex

This is the line item to read carefully. You are not coding the backfill scripts by hand — you are pairing with an AI coding agent that reads your repo, writes the scripts, runs them, and iterates. The agent does the work; you make the decisions. There are two real options today, and both authenticate against a flat-fee chat subscription so your bill does not balloon mid-build:

  • Claude Code with Claude Pro or Max — this is what I used to build the whole thing. Claude Pro starts at $20/mo, and the Max plan at $100/mo is what I run on for unlimited Opus runs (necessary for this kind of multi-hour build session).
  • Codex with ChatGPT Pro — Codex authenticates against your ChatGPT subscription and runs on GPT-5.5. ChatGPT Pro is $100/mo flat. Same idea, different vendor.

Plan on at least the $100/mo tier of whichever you pick. The cheaper plans throttle quickly inside a multi-hour agent session — you will hit limits halfway through the backfill and have to wait it out. The $100 plan is the bare minimum I would attempt this build on. Metered API access (raw Anthropic or OpenAI API keys) is possible but a footgun: an agent session that loops on a long file can burn $40 of tokens overnight. Flat subscriptions remove that risk.

The gws CLI (free)

A thin Google Workspace CLI for Sheets / Drive / Docs. If your scripts and blogs already live in Google Docs and Google Sheets (mine do), this is what reads them. Install per the Google Workspace CLI repo. If you keep your scripts somewhere else (Notion, Airtable, a CSV), swap the read step for that tool's CLI or API.

(Section 7 only) Transcription tools

Only needed if you want to backfill your old Instagram reels and YouTube videos as actual spoken transcripts rather than just your draft scripts.

Total cost. $100/mo for the AI coding agent subscription (Claude Max or ChatGPT Pro) — non-negotiable, this is what does the build. Zero for the rest of the desktop setup. Add $10/mo for Obsidian Sync if you want mobile. Add $5-50 one-time for Apify or Scrape Creators if you backfill Instagram. So $100/mo flat, plus a small one-time scraping bill if you transcribe IG.

4. Install the vault on your Mac

Four steps. Total time: ten minutes if it is your first time.

4.1 Install Google Drive for desktop

  1. Go to google.com/drive/download in Chrome.
  2. Click the blue Download Drive for desktop button.
  3. Open the downloaded GoogleDrive.dmg (it lands in ~/Downloads/).
  4. Drag the Google Drive icon into the Applications folder. Wait for the copy to finish.
  5. Eject the DMG (right-click the white Google Drive icon on your desktop → Eject).

4.2 Launch and sign in

  1. Open /Applications/Google Drive.app.
  2. macOS will ask "Are you sure you want to open this?" — click Open.
  3. It will ask for System Settings permissions (Files & Folders, Accessibility). Approve each.
  4. A browser tab opens. Sign in with the Google account you want the vault to live in.
  5. Choose Stream files (NOT Mirror). Stream = files live in the cloud, downloaded on demand. The vault folder will get fully cached on first open and stay cached.

4.3 Create the vault folder

After sign-in, Drive mounts at:

~/Library/CloudStorage/GoogleDrive-<youremail>/My Drive/
  1. Open Finder, press Cmd+Shift+G, paste that path, hit Enter.
  2. Inside My Drive, create a new folder named exactly Content Second Brain (or whatever you want — be consistent with the path in your backfill scripts later).
  3. Right-click the folder → Google DriveAvailable offline. This forces full local caching so Obsidian does not lag on graph rebuilds.
  4. Optional: drag the folder into Finder's sidebar for one-click access.

4.4 Install Obsidian and point it at the vault

  1. Download Obsidian, open the DMG, drag to Applications, eject.
  2. Open Obsidian. On the welcome screen, click Open folder as vault (or, if you have opened it before, use the vault switcher in the bottom-left → Open another vault).
  3. In the file picker, press Cmd+Shift+G and paste the path: ~/Library/CloudStorage/GoogleDrive-<youremail>/My Drive/Content Second Brain.
  4. Click Open, then Trust author and enable plugins.

You will land in an empty vault. That is exactly right — we have not backfilled anything yet.

4.5 Smoke-test the round-trip

In Finder (the same folder), create a file called hello.md. Switch back to Obsidian within two seconds — it should appear in the left sidebar. Delete it. It disappears from Obsidian. That round-trip is what every backfill script depends on.

5. The frontmatter schema

The body of a note is the content. The frontmatter is the metadata. Every Obsidian search, every Dataview query, every "find related" plugin reads the frontmatter first. Skimp on it now and you cannot retrofit later without a full re-sync.

Here is the schema I use. Every note carries this block at the top, with the type-specific fields filled in:

---
type: script
cid: C1584
title: "Getting On Camera Changes Who You Think You Are"
slug: getting-on-camera-changes-who-you-think-you-are
status: "Ready to Record"
brand: "PrestonChinSpeaks"
content_type: "Short-form Video"
topic: "self-referential encoding and identity change through daily camera reps"
tags: ["camera-confidence", "self-referential-encoding", "creator-identity", "ai", "communication"]
platforms: ["Instagram Reels", "TikTok", "YouTube Shorts"]
publish_date: null
created: 2026-05-24
last_synced: 2026-05-25
source:
  master_cid: C1584
  script_doc_url: https://docs.google.com/document/d/.../edit
  copy_doc_url: https://docs.google.com/document/d/.../edit
  gdrive_folder: https://drive.google.com/drive/folders/...
  published_urls: null
description: "A neuroscience-backed short about self-referential encoding..."
---

[Body of the note — the actual script text, blog body, or YouTube transcript.]

Three rules that keep the schema honest:

  • cid is null for non-script types. CIDs are sequential identifiers from your content sheet (mine go C0001, C0002, …). They only make sense for scripts. For a blog or YouTube note, set cid: null and do not fake one.
  • Tags are kebab-case, no spaces. Obsidian splits tags on whitespace, so #executive presence becomes two tags. Lowercase the tags, replace spaces with hyphens. The backfill script does this automatically.
  • The source block is required. A vault note with no backlink to its source is a dead reference. Every backfill writes a source block with at least one of: script_doc_url, repo_path, published_url. If a note has none of those, do not write it.

6. Backfill your existing content

Three short Node scripts. Each one walks a different source, builds a vault note, and writes it to the right subfolder. Run them in order: scripts first (highest volume, most structured), blogs second, YouTube third. After each, open Obsidian and spot-check three random notes before moving on.

6.1 Scripts from a sheet

The most valuable backfill. If you track short-form videos in a Google Sheet with a Google Doc URL per row (or in Airtable, Notion, etc.), this is the one to do first. My MASTER tab has 486 rows; 331 had a usable Script Doc URL after filtering out parked items. Each one became a vault note with frontmatter pulled from the row and body pulled from the Google Doc.

The script reads the sheet via the gws CLI, parses each row into a JS object, extracts the Google Doc ID from the URL, exports the Doc as markdown via Drive's built-in export, builds the frontmatter, and writes the file. ~200 lines including helpers.

Show the full vault-backfill-scripts.mjs
#!/usr/bin/env node
// Backfill the Obsidian vault with one markdown note per row in your content sheet.
// Usage:
//   node vault-backfill-scripts.mjs --vault "/path/to/vault" --limit 3 --dry-run
//   node vault-backfill-scripts.mjs --vault "/path/to/vault"
//   node vault-backfill-scripts.mjs --vault "/path/to/vault" --force

import { execFileSync } from "node:child_process";
import { readFileSync, writeFileSync, existsSync, mkdirSync, unlinkSync } from "node:fs";
import { join } from "node:path";

const SPREADSHEET_ID = "<YOUR_SHEET_ID>";
const TAB = "MASTER";

const args = parseArgs(process.argv.slice(2));
if (!args.vault) { console.error("ERROR: --vault is required"); process.exit(1); }
const VAULT = args.vault;
const SCRIPTS_DIR = join(VAULT, "scripts");
const LIMIT = args.limit ? Number(args.limit) : Infinity;
const SLEEP_MS = args.sleep ? Number(args.sleep) : 200;
const DRY = !!args["dry-run"];
const FORCE = !!args.force;

if (!existsSync(VAULT)) { console.error(`vault folder does not exist: ${VAULT}`); process.exit(1); }
if (!DRY) mkdirSync(SCRIPTS_DIR, { recursive: true });

const rows = readMaster();
const eligible = rows.filter(r =>
  r.cid?.startsWith("C") &&
  (r.status || "").toLowerCase() !== "parked" &&
  r.scriptDocUrl &&
  extractDocId(r.scriptDocUrl)
);
const work = eligible.slice(0, LIMIT);

let written = 0, skipped = 0, failed = 0;
const failures = [];

for (const row of work) {
  const filename = `${row.cid}-${slug(row.title)}.md`;
  const outPath = join(SCRIPTS_DIR, filename);
  if (!FORCE && existsSync(outPath)) { skipped++; continue; }

  let body;
  try { body = fetchDocAsMarkdown(extractDocId(row.scriptDocUrl)); }
  catch (e) { failures.push({ cid: row.cid, error: e.message }); failed++; continue; }

  const md = buildFrontmatter(row) + "\n" + body;
  if (!DRY) writeFileSync(outPath, md, "utf8");
  written++;
  if (SLEEP_MS) sleep(SLEEP_MS);
}

console.log(`${written} written, ${skipped} skipped, ${failed} failed.`);

function parseArgs(argv) {
  const out = {};
  for (let i = 0; i < argv.length; i++) {
    const a = argv[i];
    if (a.startsWith("--")) {
      const k = a.slice(2);
      const next = argv[i + 1];
      if (next === undefined || next.startsWith("--")) out[k] = true;
      else { out[k] = next; i++; }
    }
  }
  return out;
}

function readMaster() {
  const raw = execFileSync("gws",
    ["sheets", "+read", "--spreadsheet", SPREADSHEET_ID, "--range", `'${TAB}'!A:Z`],
    { encoding: "utf8", maxBuffer: 50 * 1024 * 1024 });
  const json = JSON.parse(raw);
  const [header, ...data] = json.values;
  const col = name => header.indexOf(name);
  // ... map header → row object (see appendix for full mapping)
  return data.map(r => ({
    cid: r[col("CID")] || "",
    title: r[col("Title")] || "",
    brand: r[col("Brand")] || "",
    status: r[col("Status")] || "",
    topic: r[col("Topic")] || "",
    tags: r[col("Tags")] || "",
    platforms: r[col("Platforms")] || "",
    scriptDocUrl: r[col("Script Doc")] || "",
    gdriveFolder: r[col("GDrive Folder")] || "",
    publishedUrls: r[col("Published URLs")] || "",
    created: r[col("Created")] || "",
    description: r[col("Description")] || "",
  }));
}

function extractDocId(url) {
  const m = url.match(/\/document\/d\/([a-zA-Z0-9_-]+)/);
  return m ? m[1] : null;
}

function fetchDocAsMarkdown(docId) {
  if (existsSync("download.bin")) unlinkSync("download.bin");
  const params = JSON.stringify({ fileId: docId, mimeType: "text/markdown" });
  const raw = execFileSync("gws", ["drive", "files", "export", "--params", params], { encoding: "utf8" });
  const resp = JSON.parse(raw);
  if (resp.status !== "success") throw new Error("export " + resp.status);
  const body = readFileSync(resp.saved_file, "utf8");
  unlinkSync(resp.saved_file);
  return body;
}

function slug(s) {
  return s.toLowerCase().normalize("NFKD")
    .replace(/['""]/g, "")
    .replace(/[^a-z0-9]+/g, "-")
    .replace(/^-+|-+$/g, "")
    .slice(0, 80);
}

function buildFrontmatter(r) {
  const today = new Date().toISOString().slice(0, 10);
  const tags = (r.tags || "").split(",").map(t => t.trim().toLowerCase().replace(/\s+/g, "-")).filter(Boolean);
  const platforms = (r.platforms || "").split(",").map(t => t.trim()).filter(Boolean);
  return [
    "---",
    "type: script",
    `cid: ${r.cid}`,
    `title: ${yamlStr(r.title)}`,
    `slug: ${slug(r.title)}`,
    `status: ${yamlStr(r.status)}`,
    `brand: ${yamlStr(r.brand)}`,
    `topic: ${yamlStr(r.topic)}`,
    `tags: ${yamlList(tags)}`,
    `platforms: ${yamlList(platforms)}`,
    `last_synced: ${today}`,
    "source:",
    `  script_doc_url: ${r.scriptDocUrl}`,
    `  gdrive_folder: ${r.gdriveFolder}`,
    "---", "",
  ].join("\n");
}

function yamlStr(s) {
  if (!s) return '""';
  const safe = String(s).replace(/\\/g, "\\\\").replace(/"/g, '\\"').replace(/\n/g, " ");
  return `"${safe}"`;
}
function yamlList(arr) { return arr?.length ? `[${arr.map(yamlStr).join(", ")}]` : "[]"; }
function sleep(ms) { const e = Date.now() + ms; while (Date.now() < e) {} }

Run it twice. First with --limit 3 to verify three notes land correctly, then without the limit for the full backfill:

node vault-backfill-scripts.mjs \\
  --vault "/path/to/Content Second Brain" \\
  --limit 3

# verify in Obsidian, then run for real:
node vault-backfill-scripts.mjs \\
  --vault "/path/to/Content Second Brain"

At ~200ms per Doc fetch (Drive's export latency plus a courtesy sleep), 300 scripts take about a minute.

6.2 Blogs from a content directory

Same shape, simpler source. If your blog posts are markdown files with YAML frontmatter (Astro, Hugo, Next.js, Eleventy, Jekyll — they all are), the script walks the directory, parses the existing frontmatter, maps fields into the vault schema, and writes the file. No API calls; runs in seconds.

Show the full vault-backfill-blogs.mjs
#!/usr/bin/env node
// Backfill blogs from a content directory (Astro / Hugo / Next-style markdown).

import { readdirSync, readFileSync, writeFileSync, existsSync, mkdirSync, statSync } from "node:fs";
import { join } from "node:path";

const BLOG_DIR = "<ABSOLUTE_PATH_TO_YOUR_BLOG_FOLDER>"; // e.g. /Users/you/site/content/blog

const args = parseArgs(process.argv.slice(2));
const VAULT = args.vault;
const OUT_DIR = join(VAULT, "blogs");
const FORCE = !!args.force;
mkdirSync(OUT_DIR, { recursive: true });

const files = readdirSync(BLOG_DIR).filter(f => f.endsWith(".md")).sort();
let written = 0;

for (const file of files) {
  const raw = readFileSync(join(BLOG_DIR, file), "utf8");
  const { frontmatter: fm, body } = parseFrontmatter(raw);
  fm.slug = fm.slug || file.replace(/\.md$/, "");
  const outPath = join(OUT_DIR, `${fm.slug}.md`);
  if (!FORCE && existsSync(outPath)) continue;

  const stats = statSync(join(BLOG_DIR, file));
  const created = stats.birthtime.toISOString().slice(0, 10);
  const tags = (fm.tags || []).map(t => String(t).toLowerCase().replace(/\s+/g, "-"));

  const newFm = [
    "---",
    "type: blog",
    "cid: null",
    `title: ${JSON.stringify(fm.title || file)}`,
    `slug: ${fm.slug}`,
    'status: "Published"',
    `tags: [${tags.map(t => JSON.stringify(t)).join(", ")}]`,
    `publish_date: ${created}`,
    "source:",
    `  repo_path: ${file}`,
    `  published_url: ${fm.canonical || ""}`,
    "---", "",
  ].join("\n");

  writeFileSync(outPath, newFm + body, "utf8");
  written++;
}

console.log(`${written} blogs written.`);

function parseArgs(argv) {
  const out = {}; for (let i = 0; i < argv.length; i++) {
    const a = argv[i]; if (a.startsWith("--")) {
      const k = a.slice(2), n = argv[i + 1];
      if (n === undefined || n.startsWith("--")) out[k] = true; else { out[k] = n; i++; }
    }
  } return out;
}

function parseFrontmatter(raw) {
  const m = raw.match(/^---\n([\s\S]*?)\n---\n?([\s\S]*)$/);
  if (!m) return { frontmatter: {}, body: raw };
  const fm = {}; for (const line of m[1].split("\n")) {
    const km = line.match(/^([a-zA-Z_][a-zA-Z0-9_]*):\s*(.*)$/); if (!km) continue;
    const k = km[1]; let v = km[2].trim();
    if (v.startsWith("[") && v.endsWith("]"))
      fm[k] = v.slice(1, -1).split(",").map(s => s.trim().replace(/^["']|["']$/g, "")).filter(Boolean);
    else if ((v.startsWith('"') && v.endsWith('"')) || (v.startsWith("'") && v.endsWith("'")))
      fm[k] = v.slice(1, -1);
    else fm[k] = v;
  }
  return { frontmatter: fm, body: m[2] };
}

Set BLOG_DIR at the top to the absolute path of your content folder, then:

node vault-backfill-blogs.mjs --vault "/path/to/vault"

6.3 YouTube scripts from a folder tree

If you keep one folder per finished YouTube video (mine looks like YouTube Videos/Done/<title>/<script>.md), this script walks them, reads the primary script MD, builds frontmatter from the folder name, and writes one vault note per video. Skips folders with no MD or ambiguous content (multiple MDs without a canonical name).

Show the full vault-backfill-youtube.mjs
#!/usr/bin/env node
// Backfill YouTube scripts from a folder of "Done" video directories.
// Expects one folder per video, with one .md inside (the script).

import { readdirSync, readFileSync, writeFileSync, existsSync, mkdirSync, statSync } from "node:fs";
import { join } from "node:path";

const YT_DIR = "<ABSOLUTE_PATH_TO_YOUR_YT_FOLDER>"; // e.g. /Users/you/YouTube Videos/Done

const args = parseArgs(process.argv.slice(2));
const VAULT = args.vault;
const OUT_DIR = join(VAULT, "youtube");
mkdirSync(OUT_DIR, { recursive: true });

const folders = readdirSync(YT_DIR, { withFileTypes: true })
  .filter(d => d.isDirectory()).map(d => d.name).sort();

let written = 0;
for (const folder of folders) {
  const mds = readdirSync(join(YT_DIR, folder)).filter(f => f.endsWith(".md"));
  if (mds.length === 0) continue;
  const primary = mds.length === 1 ? mds[0] : mds.find(m => m.startsWith("YT ")) || null;
  if (!primary) { console.log(`SKIP ${folder}/ — ambiguous`); continue; }

  const body = readFileSync(join(YT_DIR, folder, primary), "utf8");
  const sl = slug(folder);
  const outPath = join(OUT_DIR, `${sl}.md`);
  if (existsSync(outPath)) continue;

  const stats = statSync(join(YT_DIR, folder, primary));
  const created = stats.birthtime.toISOString().slice(0, 10);
  const fm = [
    "---",
    "type: youtube",
    "cid: null",
    `title: ${JSON.stringify(folder)}`,
    `slug: ${sl}`,
    'status: "Published"',
    'content_type: "YouTube Long-form"',
    `publish_date: ${created}`,
    "source:",
    `  repo_path: ${folder}/${primary}`,
    "---", "",
  ].join("\n");

  writeFileSync(outPath, fm + body, "utf8");
  written++;
}

console.log(`${written} YT scripts written.`);

function slug(s) {
  return s.toLowerCase().normalize("NFKD").replace(/[^a-z0-9]+/g, "-")
    .replace(/^-+|-+$/g, "").slice(0, 80);
}
function parseArgs(argv) {
  const out = {}; for (let i = 0; i < argv.length; i++) {
    const a = argv[i]; if (a.startsWith("--")) {
      const k = a.slice(2), n = argv[i + 1];
      if (n === undefined || n.startsWith("--")) out[k] = true; else { out[k] = n; i++; }
    }
  } return out;
}

Set YT_DIR, then:

node vault-backfill-youtube.mjs --vault "/path/to/vault"

After all three: open Obsidian, hit Cmd+G for the graph view. You should see a real cluster of dots grouped by tag, with cross-type edges (a script linked to a blog linked to a YouTube video) wherever tags overlap. That cross-type density is what makes the vault more than a backup.

7. Transcribe every old video you ever posted

This is the move most creators skip, and it is the one that takes the vault from "a copy of my drafts" to "a copy of everything I have ever actually said." Your draft script and your filmed video are not the same artifact. The take you nailed had an off-script story, a sharper turn of phrase, a callback you improvised. None of that is in your script doc. It is in the audio of the posted video, and until you transcribe it, it is invisible.

7.1 YouTube — free, ten minutes

YouTube generates auto-captions for every video you upload. They are decent for clear talking-head content (which most creator content is). You can pull them in bulk for your entire channel with one yt-dlp command:

# 1. Install yt-dlp (free, MIT-licensed CLI).
brew install yt-dlp

# 2. Pull auto-captions for every video on your channel.
#    Replace @yourchannel with your YouTube handle (or use the channel URL).
mkdir -p ~/Library/CloudStorage/GoogleDrive-you@example.com/My\ Drive/Content\ Second\ Brain/transcripts/yt
cd ~/Library/CloudStorage/GoogleDrive-you@example.com/My\ Drive/Content\ Second\ Brain/transcripts/yt

yt-dlp \
  --write-auto-subs \
  --sub-langs en \
  --sub-format vtt \
  --skip-download \
  --output "%(upload_date>%Y-%m-%d)s-%(title)s.%(ext)s" \
  --download-archive seen.txt \
  "https://www.youtube.com/@yourchannel/videos"

# 3. (Optional) For higher fidelity than YouTube's auto-captions, download the
#    audio and run Whisper locally:
yt-dlp -x --audio-format mp3 -o "%(title)s.%(ext)s" \
  --download-archive audio-seen.txt \
  "https://www.youtube.com/@yourchannel/videos"

# Then transcribe each mp3 with whisper-cpp (free) or faster-whisper (free):
for f in *.mp3; do
  whisper-cpp -m ~/whisper-cpp/models/ggml-base.en.bin -f "$f" -otxt
done

The --download-archive seen.txt flag is the key piece for re-runs: yt-dlp writes every video ID it has processed into seen.txt, so the next time you run the command, it only fetches new videos. You can put this on a weekly cron and your vault will catch up automatically.

If the auto-caption quality is rough on your channel — happens for music-heavy edits, fast cuts, or heavy accents — switch to the Whisper path at the bottom of the snippet. yt-dlp downloads the audio as MP3, then whisper.cpp transcribes locally. About 2-5× realtime; a 60-video channel takes 30 minutes on an M1.

7.2 Instagram — Scrape Creators (recommended) or Apify

Instagram does not have an "auto-captions" feature you can pull. You need a scraper that fetches your profile, then either grabs each post's audio transcript (if the API exposes one) or downloads the MP4 so you can transcribe it locally.

Option A: Scrape Creators

Scrape Creators is a paid API ($49-499/mo depending on volume) that hits one endpoint and returns a profile's recent posts with metadata + (often) transcripts. If you are going to keep using transcripts as an input — to your AI agent, to a search index, to weekly recaps — this is the cleanest tool. One call, one file, done.

# Scrape Creators ($49-499/mo plan — best fit if you'll re-use the API)
# Single call returns a profile's recent posts + transcripts where available.

curl -s "https://api.scrapecreators.com/v1/instagram/profile?handle=yourhandle" \
  -H "x-api-key: $SC_API_KEY" \
  > profile.json

# Each post object has: shortcode, caption, audio_transcript (when available).
# Loop the posts, write one .md per post into the vault:

node <<'EOF'
import { readFileSync, writeFileSync, mkdirSync } from "node:fs";
import { join } from "node:path";

const VAULT = "<PATH_TO_VAULT>/transcripts/ig";
mkdirSync(VAULT, { recursive: true });

const profile = JSON.parse(readFileSync("profile.json", "utf8"));
for (const post of profile.posts || []) {
  if (!post.audio_transcript) continue;
  const slug = post.shortcode.toLowerCase();
  const fm = `---
type: transcript
platform: instagram
shortcode: ${post.shortcode}
posted_at: ${post.taken_at_timestamp}
source_url: https://www.instagram.com/p/${post.shortcode}/
caption: ${JSON.stringify(post.caption?.slice(0, 200) || "")}
---

${post.audio_transcript}`;
  writeFileSync(join(VAULT, `${slug}.md`), fm, "utf8");
}
EOF

Option B: Apify (pay-as-you-go)

If you are doing this once and do not want a monthly bill, Apify's Instagram scraper actor costs about $0.05-0.10 per post. A hundred posts is $5-10. It returns the post metadata plus the video URL; you download the MP4 and transcribe locally with Whisper. About 30 minutes of wall-clock time including transcription.

# Apify Instagram Reel Scraper — pay-as-you-go (~$0.05-0.10 per reel).
# Free $5 credit/month covers small batches.

# 1. Get your Apify token from apify.com → Settings → Integrations.
export APIFY_TOKEN="<YOUR_APIFY_TOKEN>"

# 2. Trigger a profile dump (returns recent reels with video URLs).
curl -s "https://api.apify.com/v2/acts/apify~instagram-scraper/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
  -X POST -H 'Content-Type: application/json' \
  -d '{"directUrls":["https://www.instagram.com/yourhandle/"],"resultsLimit":100,"resultsType":"posts"}' \
  > posts.json

# 3. For each reel that has a videoUrl, download the MP4 and transcribe locally.
node <<'EOF'
import { readFileSync, writeFileSync, mkdirSync } from "node:fs";
import { execSync } from "node:child_process";
const posts = JSON.parse(readFileSync("posts.json", "utf8"));
mkdirSync("audio", { recursive: true });
mkdirSync("transcripts", { recursive: true });

for (const p of posts) {
  if (!p.videoUrl) continue;
  const mp4 = `audio/${p.shortCode}.mp4`;
  const txt = `transcripts/${p.shortCode}.txt`;
  execSync(`curl -s -L "${p.videoUrl}" -o "${mp4}"`);
  execSync(`ffmpeg -y -i "${mp4}" -ar 16000 -ac 1 "audio/${p.shortCode}.wav"`);
  execSync(`whisper-cpp -m ~/whisper-cpp/models/ggml-base.en.bin -f "audio/${p.shortCode}.wav" -otxt -of "transcripts/${p.shortCode}"`);
}
EOF

7.3 What to expect

YouTube auto-captions are publicly available, so this is fast. Instagram is rate-limited — Apify's free tier handles five concurrent requests, and Scrape Creators throttles by plan tier. For a hundred posts, expect 5-20 minutes for the API fetch and another 20-30 minutes for local Whisper to chew through the audio. Run it overnight if you are scraping a year of content.

After both scrapers run, your vault picks up two new folders:

Content Second Brain/transcripts/
  yt/   ← .vtt or .txt per video, one per upload date
  ig/   ← .md per reel, with frontmatter linking back to the IG URL

Now when you search the vault for "executive presence," you get your scripts AND the actual spoken words from every reel and video you have ever made on the topic. That is the moment the vault becomes load-bearing.

8. How to use it day to day

Once the vault is populated, the daily loop is almost nothing. Three habits, in order of value:

  1. Search before you record. Hit Cmd+O, type the topic of your next video. If three relevant notes pop up, read them first. You will either find a fresh angle (great), realize you have already shipped this idea (skip), or remix two old ideas into a new piece (the unlock).
  2. Drill down by tag. Click any tag in a note's properties panel. You get every other note that shares it, across all four content types. This is the cheapest "what have I said about X" query in the world.
  3. Install Smart Connections (free Obsidian plugin). It embeds every note and shows you the five most semantically related notes as a sidebar, as you write. When you start a new note for a fresh idea, Smart Connections quietly tells you "you wrote something close to this in March." That is "the vault talks back."

The graph view is mostly decorative. The real value is search, tags, and Smart Connections. If you find yourself in the graph view more than once a week, you are using the vault wrong.

9. The next move — wire it into your scripting agent

This is where the vault stops being a personal search tool and starts being an idea engine. Three skills you can build on top, in increasing order of leverage:

9.1 /vault-related <topic>

A Claude Code skill (or any CLI script) that takes a topic, embeds it, scores the vault notes by cosine similarity, returns the top eight with their angle. You run it manually when starting a new piece. ~50 lines of Python plus an OpenAI embeddings call. Cost: pennies per query.

9.2 Anti-repetition + cross-pollination at hook generation

If you have built a scripting agent like the Telegram Scripting Agent, add one step before HOOKS: query the vault for the new brief's topic, get the twelve closest existing notes, and inject them into the hook-generation prompt as do-not-repeat constraints. Hooks come back fresher, and any time the model sees two old notes that share tags but never combined into a video, it can suggest the combination as a new angle.

This is the highest-leverage thing the vault enables. Every Melda run that used to forget you exist now writes in conversation with your back catalog.

9.3 A local watcher daemon (optional)

If you ship new content frequently, your vault drifts out of sync. Two ways to handle it:

  • Manual re-sync. Re-run the backfill scripts weekly. The --force flag overwrites everything; without it, only new content lands. Five seconds per run.
  • Local watcher daemon. A launchd job that runs every ten minutes, polls your content sheet for new rows, and writes the new notes to the vault automatically. ~30 lines of bash plus a plist. The vault catches up to your latest script before you notice it is gone.

Recommendation: manual re-sync until you feel "ugh the vault is stale" more than once a week. Then build the daemon.

10. How to build it well

If you take nothing else from this guide, take this section. The build is replaceable; the design judgment is not.

Design principles

  • The vault is derived. Source content stays where it lives. You never edit a vault note as if it were the truth. This single rule is what makes everything else safe.
  • Frontmatter is the API. The body is content; the frontmatter is what every plugin, query, and agent reads first. Spend disproportionate effort on the schema before you write a single backfill script.
  • One vault for all content. Splitting per brand or per content type kills the cross-pollination graph, which is the whole point. Keep it one vault.
  • Re-runnable backfills. Every backfill skips existing files unless --force. That lets you re-run on a whim without thinking about state.
  • Test on three before you run on three hundred. Every backfill takes a --limit 3 flag. Use it. Spot-check three random notes in Obsidian. Then scale up.

Lessons I learned the hard way

  • Your draft script and your filmed video are different artifacts. Backfilling only your scripts means you miss every off-script story, callback, and improvised line. Always also transcribe the actual spoken content.
  • Tag hygiene matters more than you think. "Public speaking" and "public-speaking" and "publicspeaking" are three different tags to Obsidian. Pick one form in your backfill script and stick to it forever.
  • Do not invent metadata you do not have. If a blog post's frontmatter is missing the publish date, use the file's birthtime — do not guess. Garbage-in always finds a way to surface in a search result later.
  • The graph view is pretty, not useful. Beginners spend an hour rotating the graph and conclude the vault is fancy but not actionable. The real value is Cmd+O search, tag drill-down, and embeddings-backed plugins. Skip the graph and get to those.
  • Don't pay for Obsidian Sync until you need it. Desktop-only is enough for 90% of the value. Add Sync only if you actually open the vault on your phone.

11. Appendix

Folder layout your scripts assume

Content Second Brain/
  scripts/
    C0001-your-first-script.md
    C0002-...
  blogs/
    your-first-blog-post.md
    ...
  youtube/
    your-first-youtube-video.md
    ...
  transcripts/
    yt/
      2024-01-15-your-first-video.en.vtt
      ...
    ig/
      ABC123.md
      ...
  _attachments/
    (any images embedded in notes)
  .obsidian/
    (Obsidian's settings — synced too, so settings carry across machines)

Troubleshooting

  • Obsidian doesn't see new files. Click the file explorer icon at the top-left twice to force a refresh. Or quit and relaunch.
  • Google Drive desktop says "syncing forever". Right-click the Drive icon in your menu bar → Pause → Resume. If that fails, sign out and back in.
  • gws drive files export returns a 403. The OAuth scopes you authorized do not include Drive export. Re-auth: gws auth login --scopes drive,docs,sheets.
  • yt-dlp says "video is unavailable." Update yt-dlp: brew upgrade yt-dlp. YouTube changes its API often; yt-dlp ships fixes within hours.
  • Whisper takes forever. You probably downloaded the large model. Switch to base.en for English-only content; 6× faster, almost as accurate for clear speech.
  • The vault folder is huge. 500 markdown notes plus a year of YouTube transcripts is still under 100 MB. If yours is bigger, you probably accidentally moved videos into the vault. Move them out — markdown only.

Built in one Claude Code session. The hard part was never the code; it was deciding the schema and refusing to put a CMS layer on top of it.