← Back to lab
interactive April 5, 2026

Automated Content Pipeline

An end-to-end system where an AI agent orchestrates the full content lifecycle — from topic discovery to search indexing — across multiple publishing channels.

10-stage automated content pipeline visualization with glowing green nodes on dark background
01

Topic Discovery

Multiple content sources — newsletter feeds, aggregator signals, trending topic APIs — funnel into a shared topic queue. Each candidate is scored by a combination of recency, estimated search demand, and gap analysis against the existing content library. Duplicate and near-duplicate topics are collapsed.

An agent session reads the ranked queue and selects one topic per run. The selection favors topics that complement recently published pieces — clustering related subjects over time to build topical depth rather than publishing isolated articles. The chosen topic, along with its scoring metadata, is passed downstream as the session's root context.

02

Research

Before a word is written, a semantic search runs across a persistent memory store — pulling relevant fragments from past articles, saved notes, and experiment write-ups. This grounds the new piece in accumulated knowledge rather than starting cold. The retrieval surface is project-scoped, so the agent only pulls from the relevant site's history.

Retrieved context is ranked by relevance score and truncated to fit the model's context window. The agent also checks whether an article on this topic already exists, preventing near-duplicate publishing. If a close match is found, the topic is flagged for a refresh workflow instead.

03

Writing Engine

A language model receives the topic, research context, and site-specific style rules. It generates a structured draft: introduction, keyed sections with headings, and a conclusion. The output is validated against a minimum word count before being accepted — the agent expands and retries if the draft falls short.

For sites publishing in Czech, a diacritics integrity check runs on the output. The model has a tendency to drop accented characters under certain prompt conditions; validation catches this before the content reaches publishing. Each site's tone and content type (tutorial, review, roundup, opinion) is specified in a per-site config file loaded at session start.

04

Image Generation

A prompt is derived from the article headline and passed to an AI image generation service. The agent operates a fallback chain: if the primary service fails or times out, it retries against a secondary endpoint. If both fail, the pipeline continues without an image rather than blocking publication.

On success, the generated image is uploaded directly to object storage and served via CDN. The stored URL and alt text are attached to the article record. Image generation cost is logged per run alongside the model used, enabling cost analysis across different generation services over time.

05

Publishing

Article content and image metadata are pushed to the target CMS via its REST API. The slug is derived from the title with diacritics stripped for URL safety. Tags are normalized — deduplicated, lowercased, stripped of JSON artifacts that occasionally appear in model output. The API response status is validated before the step is marked complete.

Publishing is the most failure-prone stage: WAF rules, rate limits, and authentication drift all cause transient errors. The agent retries once on non-200 responses and verifies the article is publicly accessible before continuing. Soft failures (published but missing image) are logged as warnings rather than errors.

06

Social Distribution

Platform-specific post variants are generated from the article: a thread for long-form platforms, a short hook for character-limited feeds, a visual caption for image-first channels. Each variant respects the platform's content norms — what reads well as a tweet doesn't work as a LinkedIn post.

Posts enter an approval queue rather than publishing directly. A lightweight approval UI shows drafts grouped by platform; a human approves or edits before scheduling. Approved posts are staggered across channels to avoid simultaneous burst — a pattern that suppresses engagement on most platform algorithms. The staggering interval is configurable per site.

07

Notification

An async message fires to a chat channel with the published article URL, word count, estimated generation cost, and model version used. The message body embeds a session token — a short identifier that links back to the pipeline's state file.

Replying to the notification with a command resumes the workflow or triggers follow-up actions: re-generate the image, publish social posts immediately, refresh the article, or mark the run as failed. This turns a one-way status ping into a two-way control interface, without requiring a dedicated dashboard for basic operations.

08

Indexing

A programmatic indexing request is submitted to the search engine API immediately after the article is confirmed live. This cuts the typical content discovery lag from several days to a few hours for most pages. The request URL and response timestamp are stored alongside the article record.

Indexing is best-effort — the API quota is finite and failures are non-critical. If the request is rejected due to quota exhaustion, the URL is queued for retry in the next available slot. A daily quota monitor runs separately and alerts when the limit is consistently being hit, prompting a review of which content types to prioritize.

09

Logging

Every pipeline run writes a structured log entry to a central dashboard database: article title, site, published URL, content type, language model used, image generation service, estimated cost, word count, and publish timestamp. This single table covers all managed sites, making cross-site analysis straightforward.

The dashboard enables cost tracking over time, model comparison (which model produces better output per dollar), content calendar views, and underperformance detection when combined with search console data. Articles that attract little organic traffic within 30 days are flagged for a refresh pass — feeding back into the topic queue as high-priority rewrites.

10

State Management

A checkpoint file is written to disk at the completion of each major step. If the pipeline fails mid-run — API timeout, image generation error, rate limit, model refusal — the next invocation reads the checkpoint and resumes from the last successful stage. No manual intervention is needed for common transient failures.

Checkpoint files include the session topic, all intermediate outputs (draft text, image URL, publish response), and step completion flags. They're cleaned up automatically on successful pipeline completion and after a 48-hour TTL if the run was abandoned. The checkpoint pattern turns a brittle sequential script into a recoverable workflow — especially important for long runs that cross API rate-limit windows.

The pipeline is not code. It's a prompt — a set of numbered steps with rules — that an AI agent interprets and executes. No DAG definition, no workflow engine. The model reads the context, decides what to do next, calls the right tools, and handles edge cases inline. That's the experiment.