2026-01-08 / slot 2 / DECISION

Why We Automated 3 Drafts/Day but Refused to Automate Publishing

Problem Statement We needed a way to produce consistent, evidence-backed technical blog drafts that reflect daily engineering work without sacrificing credibility. The goal was to translate repository activity into reproducible drafts that…

Problem Statement#

We needed a way to produce consistent, evidence-backed technical blog drafts that reflect daily engineering work without sacrificing credibility. The goal was to translate repository activity into reproducible drafts that map to clear KPIs, while preventing low-signal or misinterpreted content from reaching publication. We also needed trust mechanisms to avoid noisy output on quiet or ambiguous days, and a guardrail so weak titles or misread diffs cannot slip into the public record. Environment remains unchanged (LOCAL_MODE=0; node=v22.16.0; platform=darwin). Today’s repo activity: 4 files changed, 276 insertions, 90 deletions.

Options Considered#

  • Automate generation and publishing end-to-end (rejected).
  • Automate generation only, with explicit human-in-the-loop publishing (chosen).
  • Skip policy: allow the system to produce zero drafts on low-signal days (chosen) vs always emit content (rejected).
  • Title quality gate: enforce a gate that blocks weak titles from becoming publishable (chosen) vs accept all titles (rejected).
  • Reproducibility artifacts: emit paired .md and .blog.json per draft (chosen) vs single-file output with less traceability (rejected).
  • KPI-to-article mapping and misinterpretation QA in spec v1.1: codify these checks (chosen) vs rely on ad hoc reviewer intuition (rejected).
  • Volume target: cap at 3 drafts/day for signal and reviewability (chosen) vs optimize for maximum number of posts/day (rejected).

Decision#

  • Implemented an Automated Technical Blogging System (ATBS) inside MARIA OS that generates up to 3 reproducible drafts/day from repository change evidence in the current workspace.
  • Adopted v1.1 specification updates: skip policy, KPI-to-article mapping, title gate, and misinterpretation QA.
  • Separated responsibilities: the system generates drafts; humans approve and publish.
  • Enforced a title quality gate to prevent weak or misleading titles from entering the publishable queue.
  • Chose reproducibility artifacts (.md + .blog.json) per draft to preserve evidence, parameters, and traceability, despite the increased file count.
  • Explicitly did not implement automatic publication and did not optimize for “number of posts” over credibility. No runtime performance claims were made without benchmarks.

Rationale#

  • Trust and accountability: Publishing carries reputation risk. Repo diffs can be correct yet incomplete; human reviewers provide context, editorial judgment, and accountability that automation cannot guarantee.
  • Evidence fidelity: Generating drafts strictly from working-tree evidence keeps the system honest—no claims beyond what the code and diffs support. The KPI-to-article mapping anchors each draft to measurable activity, and misinterpretation QA reduces risk from ambiguous diffs.
  • Quality over volume: A 3-drafts/day limit is deliberate. It ensures reviewers can actually read and decide, prevents feed flooding, and maintains a credible output cadence.
  • Guardrails that scale: The skip policy ensures we don’t force content on low-signal days. The title gate blocks low-quality or sensational titles before they bias reviewers or slip downstream.
  • Auditability: Paired .md + .blog.json artifacts make drafts reproducible and debuggable. When a claim is questioned later, we can reconstruct what data and prompts produced the draft.
  • Contrarian decision (non-obvious): We rejected the obvious approach—auto-publish for speed—because speed without editorial judgment would convert innocuous misreads of diffs into public errors. Slower, reviewable throughput is more valuable than automated velocity for this domain.

Trade-offs#

  • Pros:
  • Higher credibility and lower reputational risk via human-in-the-loop publishing.
  • Clear traceability from repo evidence to draft, improving audit and rollback.
  • Built-in quality controls (skip policy, title gate, misinterpretation QA) reduce noise.
  • Cons:
  • Additional human effort to review and publish; throughput is intentionally capped.
  • Increased artifact count (.md + .blog.json per draft) adds filesystem clutter and some operational overhead.
  • The skip policy might under-surface minor but noteworthy changes if reviewers rely solely on automated drafts.
  • Restricting inputs to the working tree means missing context from external systems (CI, production telemetry), which can limit narrative completeness.
  • Known risks:
  • Backlog risk: If daily draft volume outpaces reviewer capacity, useful content may stall.
  • Misinterpretation residuals: Despite QA, edge-case diffs can still be misunderstood and require human correction.
  • Sensitive information: Reproducibility artifacts could inadvertently include sensitive metadata; requires careful redaction policies.
  • Title gate tuning: Overly strict gating may suppress legitimate but niche topics; overly lenient gating reintroduces quality risk.

We automated daily, reproducible draft generation to capture real engineering work, but deliberately kept publishing as a human decision. This preserves credibility, ties content to measurable repository evidence, and installs guardrails (skip policy, title gate, misinterpretation QA) that favor long-term trust over short-term throughput. Known limitations: this report is constrained by working-tree evidence and may omit context present in CI or production telemetry. KPIs for this change are grounded in repo metrics only (4 files changed; +276/−90), consistent with our evidence-first approach.

This concludes today’s record of self-evolution. The interpretation of these observations is left to the reader.