AHD · Artificial Human Design

Make it specific.

A guardrail and evaluation layer for AI-generated design. A named taxonomy of thirty-nine slop tells, a token-driven brief compiler, a deterministic linter, and a reproducible raw-vs-compiled eval loop. Across web UI and image generation.

npm install --save-dev @adastracomputing/ahd

Or try it without installing:

npx @adastracomputing/ahd lint page.html

Full setup, npm, source.

Measured · 24 April 2026 · cross-token triangulation

Same brief, different style token (post-digital-green), eleven models, n=30 per cell, six hundred sixty samples. The first triangulation surfaced a real limit in the rule design: editorially-opinionated rules fired on output that was correct for a token they were not written for. AHD shipped token-aware linting in response. Re-linting the same samples under the corrected ruleset moves six of eleven cells positive, with gpt-oss leading at 47.6 percent reduction. Click any row for the post-fix per-cell reading.

gpt-oss 120B: 48%↓
Gemma 4 26B: 50%↓
Kimi K2.6: 30%↓
Gemini 3.1 Pro Preview: 26%↓
Mistral Small 3.1: 22%↓
Claude Opus 4.7: regressed 68%↑
gpt-5.5: regressed 9%↑
gpt-5.4: regressed 36%↑

Eight cells shown. Three cells with very low absolute baselines (Llama 3.3, Llama 4 Scout, Qwen3 30B) show numerically large percentage moves on tiny absolute changes; full eleven-cell breakdown plus the pre-fix-versus-post-fix table lives at eval · 24 April 2026.

Measured · 22 April 2026 · single-token n=30

Same brief, raw versus AHD-compiled, ten models, n=30 per cell, six hundred samples. Eight of ten cells reduce tells. Median reduction 59 percent across the positive cells. Click any row for the per-model reading.

gpt-oss 120B: 78%↓
Mistral Small 3.1: 62%↓
Kimi K2.6: 62%↓
Gemini 3.1 Pro Preview: 62%↓
Claude Opus 4.7: 59%↓
Llama 3.3 70B: regressed 117%↑

Full report with attempted-vs-scored counts, per-tell frequency table, serving paths and the run manifest: eval · 22 April 2026. Different-token follow-up: eval · 24 April 2026. Every run: /evals. How to read these numbers: the run's own reading guide, or the general methodology.

Four pieces

Named taxonomy

Thirty-nine concrete slop tells across web, graphic and typographic surfaces. Enforced by 35 HTML/CSS rules, 3 SVG rules, and 14 vision-critic rules on rendered pixels. Read the taxonomy.
Style tokens

Ten curated design directions spanning Swiss-Editorial, Manual SF, Neubrutalist-Gumroad, Post-Digital, Monochrome-Editorial, Memphis-Clash, Heisei-Retro, Bauhaus-Revival, Editorial- Illustration and Ad-Creative-Collision. Each declares its own forbidden list, required quirks and reference lineage.
Brief compiler

ahd compile takes a structured intent and emits a token-anchored system prompt for any LLM. Draft mode for exploration, final mode for single-shot output. See how.
Empirical eval

Raw-vs-compiled controlled comparison across Claude Opus 4.7, GPT-5, Gemini 3 Pro, Llama 3.3 70B, Llama 4 Scout, Mistral Small 3.1, Qwen 2.5 Coder, DeepSeek R1, and image generators FLUX.1 schnell, SDXL Lightning and DreamShaper. Attempted, extracted, scored counts published. Negative results first-class.

Make it specific.

Measured · 24 April 2026 · cross-token triangulation

Measured · 22 April 2026 · single-token n=30

Four pieces

Named taxonomy

Style tokens

Brief compiler

Empirical eval