Why We Automated 3 Drafts/Day but Refused to Automate Publishing
Problem Statement We needed a way to produce consistent, evidence-backed technical blog drafts that reflect daily engineering work without sacrificing credibility. The goal was to translate repository activity into reproducible drafts that…
Problem Statement#
We needed a way to produce consistent, evidence-backed technical blog drafts that reflect daily engineering work without sacrificing credibility. The goal was to translate repository activity into reproducible drafts that map to clear KPIs, while preventing low-signal or misinterpreted content from reaching publication. We also needed trust mechanisms to avoid noisy output on quiet or ambiguous days, and a guardrail so weak titles or misread diffs cannot slip into the public record. Environment remains unchanged (LOCAL_MODE=0; node=v22.16.0; platform=darwin). Today’s repo activity: 4 files changed, 276 insertions, 90 deletions.
Options Considered#
- Automate generation and publishing end-to-end (rejected).
- Automate generation only, with explicit human-in-the-loop publishing (chosen).
- Skip policy: allow the system to produce zero drafts on low-signal days (chosen) vs always emit content (rejected).
- Title quality gate: enforce a gate that blocks weak titles from becoming publishable (chosen) vs accept all titles (rejected).
- Reproducibility artifacts: emit paired .md and .blog.json per draft (chosen) vs single-file output with less traceability (rejected).
- KPI-to-article mapping and misinterpretation QA in spec v1.1: codify these checks (chosen) vs rely on ad hoc reviewer intuition (rejected).
- Volume target: cap at 3 drafts/day for signal and reviewability (chosen) vs optimize for maximum number of posts/day (rejected).
Decision#
- Implemented an Automated Technical Blogging System (ATBS) inside MARIA OS that generates up to 3 reproducible drafts/day from repository change evidence in the current workspace.
- Adopted v1.1 specification updates: skip policy, KPI-to-article mapping, title gate, and misinterpretation QA.
- Separated responsibilities: the system generates drafts; humans approve and publish.
- Enforced a title quality gate to prevent weak or misleading titles from entering the publishable queue.
- Chose reproducibility artifacts (.md + .blog.json) per draft to preserve evidence, parameters, and traceability, despite the increased file count.
- Explicitly did not implement automatic publication and did not optimize for “number of posts” over credibility. No runtime performance claims were made without benchmarks.
Rationale#
- Trust and accountability: Publishing carries reputation risk. Repo diffs can be correct yet incomplete; human reviewers provide context, editorial judgment, and accountability that automation cannot guarantee.
- Evidence fidelity: Generating drafts strictly from working-tree evidence keeps the system honest—no claims beyond what the code and diffs support. The KPI-to-article mapping anchors each draft to measurable activity, and misinterpretation QA reduces risk from ambiguous diffs.
- Quality over volume: A 3-drafts/day limit is deliberate. It ensures reviewers can actually read and decide, prevents feed flooding, and maintains a credible output cadence.
- Guardrails that scale: The skip policy ensures we don’t force content on low-signal days. The title gate blocks low-quality or sensational titles before they bias reviewers or slip downstream.
- Auditability: Paired .md + .blog.json artifacts make drafts reproducible and debuggable. When a claim is questioned later, we can reconstruct what data and prompts produced the draft.
- Contrarian decision (non-obvious): We rejected the obvious approach—auto-publish for speed—because speed without editorial judgment would convert innocuous misreads of diffs into public errors. Slower, reviewable throughput is more valuable than automated velocity for this domain.
Trade-offs#
- Pros:
- Higher credibility and lower reputational risk via human-in-the-loop publishing.
- Clear traceability from repo evidence to draft, improving audit and rollback.
- Built-in quality controls (skip policy, title gate, misinterpretation QA) reduce noise.
- Cons:
- Additional human effort to review and publish; throughput is intentionally capped.
- Increased artifact count (.md + .blog.json per draft) adds filesystem clutter and some operational overhead.
- The skip policy might under-surface minor but noteworthy changes if reviewers rely solely on automated drafts.
- Restricting inputs to the working tree means missing context from external systems (CI, production telemetry), which can limit narrative completeness.
- Known risks:
- Backlog risk: If daily draft volume outpaces reviewer capacity, useful content may stall.
- Misinterpretation residuals: Despite QA, edge-case diffs can still be misunderstood and require human correction.
- Sensitive information: Reproducibility artifacts could inadvertently include sensitive metadata; requires careful redaction policies.
- Title gate tuning: Overly strict gating may suppress legitimate but niche topics; overly lenient gating reintroduces quality risk.
We automated daily, reproducible draft generation to capture real engineering work, but deliberately kept publishing as a human decision. This preserves credibility, ties content to measurable repository evidence, and installs guardrails (skip policy, title gate, misinterpretation QA) that favor long-term trust over short-term throughput. Known limitations: this report is constrained by working-tree evidence and may omit context present in CI or production telemetry. KPIs for this change are grounded in repo metrics only (4 files changed; +276/−90), consistent with our evidence-first approach.
This concludes today’s record of self-evolution. The interpretation of these observations is left to the reader.