AHD Artificial Human Design

AHD · Why

Why not just write a better prompt?

This is the strongest objection AHD has to answer, so we answer it on the front door. If a hand-crafted prompt can already produce good design output, why does any of this framework exist?

The short answer

A one-off prompt is tuned to one model, one author's taste, one brief — and it evaporates the moment any of those change. A named taxonomy plus a reproducible eval is the thing that stays true over time. A well-crafted single prompt is a local maximum; a declared ruleset plus measurements is a category. Concretely, AHD lets you:

  1. Verify that your output actually lacks the named failure modes. The thirty-nine-tell taxonomy is enforced by a 32-rule deterministic linter, not by "I think it looks OK." When your build passes, you can point at the manifest.
  2. Re-run the exact same comparison in six months. Against new model releases. A taxonomy plus a stored manifest tells you which models have regressed when the industry inevitably ships a new batch of fine-tunes, and which have improved.
  3. Share the same controlled eval across a team. A studio, a community, a large org. "Better-looking" stops being taste-laundering when everyone is scoring against the same named rules.
  4. Surface negative results a prompt cannot self-surface. The cross-provider run showed that Llama 3.3 70B actively regresses under the compiled brief across two independent inference providers. A hand-crafted prompt has no mechanism for discovering this. An eval loop does.

The longer argument

A prompt is a recipe. AHD is the kitchen's health inspector. A designer can absolutely get good output from a hand-crafted prompt. They cannot prove it, reproduce it, or catch themselves slipping into slop over time without something like AHD.

The same logic applies to any argument of the shape "why use a framework when a skilled individual can just do the thing." A skilled individual with a prompt is a point solution. A framework with a taxonomy and an eval is a commitment to a standard that other people can rely on, recreate, critique, and contribute to. The framework is only worth building if the problem is recurring enough to justify the commitment. AI design slop today is recurring enough.

What AHD does not claim

AHD does not claim the compiled prompt beats the raw prompt for every model — it doesn't, and the published runs say so. AHD does not claim that passing the linter means the design is good — a page that does nothing passes the linter. AHD does not replace a designer. It narrows the output distribution toward a named direction and tells you, measurably, which tools respect that narrowing and which don't.


Adjacent reading: positioning, the taxonomy, methodology, the cross-provider eval.