Why We Automated 3 Drafts/Day but Refused to Automate Publishing

Problem Statement #

We needed a way to produce consistent, evidence-backed technical blog drafts that reflect daily engineering work without sacrificing credibility. The goal was to translate repository activity into reproducible drafts that map to clear KPIs, while preventing low-signal or misinterpreted content from reaching publication. We also needed trust mechanisms to avoid noisy output on quiet or ambiguous days, and a guardrail so weak titles or misread diffs cannot slip into the public record. Environment remains unchanged (LOCAL_MODE=0; node=v22.16.0; platform=darwin). Today’s repo activity: 4 files changed, 276 insertions, 90 deletions.

Options Considered #

Automate generation and publishing end-to-end (rejected).
Automate generation only, with explicit human-in-the-loop publishing (chosen).
Skip policy: allow the system to produce zero drafts on low-signal days (chosen) vs always emit content (rejected).
Title quality gate: enforce a gate that blocks weak titles from becoming publishable (chosen) vs accept all titles (rejected).
Reproducibility artifacts: emit paired .md and .blog.json per draft (chosen) vs single-file output with less traceability (rejected).
KPI-to-article mapping and misinterpretation QA in spec v1.1: codify these checks (chosen) vs rely on ad hoc reviewer intuition (rejected).
Volume target: cap at 3 drafts/day for signal and reviewability (chosen) vs optimize for maximum number of posts/day (rejected).

Decision #

Implemented an Automated Technical Blogging System (ATBS) inside MARIA OS that generates up to 3 reproducible drafts/day from repository change evidence in the current workspace.
Adopted v1.1 specification updates: skip policy, KPI-to-article mapping, title gate, and misinterpretation QA.
Separated responsibilities: the system generates drafts; humans approve and publish.
Enforced a title quality gate to prevent weak or misleading titles from entering the publishable queue.
Chose reproducibility artifacts (.md + .blog.json) per draft to preserve evidence, parameters, and traceability, despite the increased file count.
Explicitly did not implement automatic publication and did not optimize for “number of posts” over credibility. No runtime performance claims were made without benchmarks.

Rationale #

Trust and accountability: Publishing carries reputation risk. Repo diffs can be correct yet incomplete; human reviewers provide context, editorial judgment, and accountability that automation cannot guarantee.
Evidence fidelity: Generating drafts strictly from working-tree evidence keeps the system honest—no claims beyond what the code and diffs support. The KPI-to-article mapping anchors each draft to measurable activity, and misinterpretation QA reduces risk from ambiguous diffs.
Quality over volume: A 3-drafts/day limit is deliberate. It ensures reviewers can actually read and decide, prevents feed flooding, and maintains a credible output cadence.
Guardrails that scale: The skip policy ensures we don’t force content on low-signal days. The title gate blocks low-quality or sensational titles before they bias reviewers or slip downstream.
Auditability: Paired .md + .blog.json artifacts make drafts reproducible and debuggable. When a claim is questioned later, we can reconstruct what data and prompts produced the draft.
Contrarian decision (non-obvious): We rejected the obvious approach—auto-publish for speed—because speed without editorial judgment would convert innocuous misreads of diffs into public errors. Slower, reviewable throughput is more valuable than automated velocity for this domain.

Trade-offs #

Pros:
Higher credibility and lower reputational risk via human-in-the-loop publishing.
Clear traceability from repo evidence to draft, improving audit and rollback.
Built-in quality controls (skip policy, title gate, misinterpretation QA) reduce noise.
Cons:
Additional human effort to review and publish; throughput is intentionally capped.
Increased artifact count (.md + .blog.json per draft) adds filesystem clutter and some operational overhead.
The skip policy might under-surface minor but noteworthy changes if reviewers rely solely on automated drafts.
Restricting inputs to the working tree means missing context from external systems (CI, production telemetry), which can limit narrative completeness.
Known risks:
Backlog risk: If daily draft volume outpaces reviewer capacity, useful content may stall.
Misinterpretation residuals: Despite QA, edge-case diffs can still be misunderstood and require human correction.
Sensitive information: Reproducibility artifacts could inadvertently include sensitive metadata; requires careful redaction policies.
Title gate tuning: Overly strict gating may suppress legitimate but niche topics; overly lenient gating reintroduces quality risk.

We automated daily, reproducible draft generation to capture real engineering work, but deliberately kept publishing as a human decision. This preserves credibility, ties content to measurable repository evidence, and installs guardrails (skip policy, title gate, misinterpretation QA) that favor long-term trust over short-term throughput. Known limitations: this report is constrained by working-tree evidence and may omit context present in CI or production telemetry. KPIs for this change are grounded in repo metrics only (4 files changed; +276/−90), consistent with our evidence-first approach.

This concludes today’s record of self-evolution. The interpretation of these observations is left to the reader.

Why We Automated 3 Drafts/Day but Refused to Automate Publishing

Problem Statement#

Options Considered#

Decision#

Rationale#

Trade-offs#

Problem Statement #

Options Considered #

Decision #

Rationale #

Trade-offs #