[bmdpat]
All writing
6 min read

One Person, 12 Agents, a Holding Company

Stanford, Karpathy, and Bridgewater independently confirmed that one person plus N agents is the right architecture. I have been running it for a holding company. Here is what it looks like.

Share LinkedIn

Three research teams published results in the same week. Stanford political science. Karpathy's AutoResearch lab. Bridgewater's AIA Labs. None of them talked to each other. All three arrived at the same conclusion.

One person plus AI agents outperforms either one alone.

I've been running that architecture for a holding company since early 2026. Not a startup. Not a SaaS product. A holding company with real estate positions, equity crowdfunding bets, a live trading bot, open source projects, and a blog. All operated by one person and 12 scheduled AI agents.

Here's what that actually looks like.

The operating model

Every night at 10:45 PM, the first agent wakes up. It lints the vault (an Obsidian knowledge base that serves as the company's operating system). It checks for contradictions, orphaned files, stale data. It fixes what it can and flags what it can't.

At 11:30 PM, the queue sweep starts. This agent iterates through every pending task across four code repositories. For each task, it reads the repo, writes a work plan, builds a proof of concept, runs a full implementation, spawns a separate QA agent to review the diff, creates a pull request, waits for CI, and merges if everything is green. If something is ambiguous or blocked, it writes a request file and moves to the next item.

At 4:00 AM, the nightshift supervisor audits everything that happened overnight. It checks every PR, verifies every deploy, reviews every escalation.

At 5:15 AM, the email scanner pulls article links from Gmail. At 5:30 AM, vault health runs integrity checks. At 5:45 AM, the digest agent processes new articles into a knowledge wiki, scores them against active projects, and auto-routes high-scoring items into task queues.

At 7:30 AM, the morning brief compiles everything into a single report. What shipped. What's blocked. What needs a human decision.

I wake up. I read one file. I make decisions. I go to my day job.

That's it. That's the whole model.

The external validation

I built this before any of the research came out. Not because I'm prescient. Because it was the obvious architecture for someone with a full-time job, a side business, and no employees.

But the research confirms the pattern works at scale.

Karpathy's AutoResearch ran 12 experiments per hour. 37 experiments and 93 commits in a single overnight session. One agent loop, running unsupervised, producing what a team would ship in a full sprint. The key detail: every experiment produced a commit artifact. The loop only advanced when the artifact existed. That's the same artifact-enforced discipline my queue worker uses. No WORK_PLAN.md, no implementation. No QA_REPORT.md, no merge.

Bridgewater's multi-agent forecasting uses multiple agents that research independently, then a supervisor reconciles disagreements. The system matches human superforecasters on prediction accuracy. This maps directly to my overnight loop: coding agents produce artifacts independently, the nightshift supervisor reconciles them, the morning brief synthesizes for the human.

Stanford's finding is the most important one. Human expertise plus AI tooling outperformed fully autonomous agents in election prediction markets. The human checkpoint adds value. It doesn't slow things down. It's not a compromise. It's the architecture that wins.

My morning review (reading one report, making 5-10 decisions, spending 20 minutes) is the human checkpoint. The agents can't replace it. They shouldn't.

What agents actually can't do

This is the part most AI content skips.

My agents can't decide whether a trading strategy change is worth the risk. They can't judge whether a blog post matches my voice. They can't tell if a pull request introduces a subtle architectural regression that only matters in the context of where the product is heading.

They can't sense when I'm overextending. They can't tell me to stop adding holdings and start compounding the ones I have. (A human cofounder would. So I built that feedback into the system prompts instead.)

They can write code, run tests, publish posts, scan emails, digest articles, score opportunities, and produce reports. They do that every night while I sleep. But the judgment layer is mine.

Removing the human from the loop doesn't make the system faster. It makes it dumber. Stanford proved it. I live it.

The numbers

I'm not going to pretend the holding company is printing money. It's not. Here's reality.

The agents have published 26+ blog posts. They've processed hundreds of articles into a knowledge wiki with 93+ files. They run a queue sweep every night that processes code tasks across four repositories. They maintain a live trading bot on Alpaca. They monitor 9 equity crowdfunding positions across 5 sectors.

MRR is $0. The tripwire target is $200/month to cover Claude Premium (the AI that powers the agents). The open source project (AgentGuard, a runtime budget and rate limiter for AI agents) has 1,700+ monthly downloads on PyPI.

This is day 4 of a 90-day commitment. The bet this quarter is not revenue. It's proving the nervous system can handle a live P&L curve without retreating. The capital deployment comes after the psychological unlock.

Real numbers. Not vibes.

The stack

For anyone building something similar:

  • Orchestration: Obsidian vault with YAML frontmatter as the task/state layer. No database. Markdown files are the database.
  • Agents: Claude Code (Anthropic) with structured prompts. Each agent is a single markdown file with a mission brief, context loading order, and required output artifacts.
  • Scheduling: Windows Task Scheduler firing Claude Code CLI sessions on cron. Nothing fancy.
  • Repos: Next.js on Vercel, Python on PyPI, game on Vercel. Standard stuff.
  • Guard rails: AgentGuard for runtime budget and rate limiting. Required artifacts at every workflow phase. QA subagents that review before merge. Post-merge smoke tests with auto-revert on failure.
  • Knowledge: Karpathy-style wiki with three layers (sources, entities, concepts) plus a synthesis layer where opinions live. Agents write sources. The weekly review updates syntheses.

The whole system runs on a Windows machine with three GPUs (RTX 3070 + 5070 Ti + 5090) and a Claude Premium subscription. Total monthly cost: $20 for Claude.

The real insight

The pattern is not "AI replaces people." The pattern is "one person plus N agents replaces a team."

Karpathy doesn't have a team of grad students running 12 experiments an hour. He has a loop. Bridgewater doesn't have more forecasters. They have agents that disagree and reconcile. Stanford didn't find that AI is better than humans. They found that humans with AI are better than AI alone.

I don't have employees. I have 12 scheduled prompts, a knowledge wiki, and a morning coffee ritual where I read one file and make decisions.

The holding company is small. The ambition is not. The 90-day horizon runs through July 2026. By then, the trading bot will have 90 days of P&L history. The wiki will have hundreds of synthesized sources. The blog will have 30+ posts. The system will have compounded.

That's the point. Not a single big launch. Compounding across decades.


If you're running AI agents overnight and want to make sure they don't burn your budget while you sleep, check out AgentGuard. Runtime budget limits, token caps, and rate limiting for any AI agent. pip install agentguard47.

PH

Patrick Hughes

Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Fifteen agents.

Want more like this?

AI agent builds, real costs, what works. One email per week. No fluff.

More writing