Alejandro Ashes

Launching December 2026

The Grimdark Lens

An evaluation instrument for grimdark fiction, and a transfer protocol for tacit creative judgment.

Watch the repo

Most evaluations of creative writing collapse into taste. A reviewer reads an output, decides whether it landed, and moves on. That works fine for a single reader. It does not produce infrastructure anyone else can run.

The Grimdark Lens is a project about closing that gap in a domain where the gap is wide. Grimdark fiction has identifiable conventions and identifiable failure modes, and current language models miss it in ways a working novelist can name on sight. This project takes that practitioner-level judgment and turns it into something a machine can apply, with the rubric, the anchors, the reliability study, and the reasoning all open to inspection.

How the lens works#

01

Two layers

The lens has two passes. The first is a binary error-code check: fifteen specific failure modes that an output either contains or does not. The second is a gradient quality read, applied only to work that clears the first pass. Outputs that fail the first layer never reach the second.

03

Reliability is measured, not assumed

Five trained annotators rate a hundred-passage subset against the codebook before launch. The launch report publishes per-code Fleiss' kappa, the disagreement cases, and the calibration changelog that produced the final codes. Where the codes hold up, the numbers show it. Where they do not, the report says so.

04

Triangulated reliability

Standard inter-rater designs assume two annotators on the same work. This project goes wider: five-annotator agreement on the binary codes, two judge models from different labs on the gradient layer, and a set of canonical literary anchor passages running underneath both. The structure is documented as a deliberate methodological choice.

05

A transfer protocol, with a leaderboard downstream

What ships in December is the protocol: the codebook, the anchors, the judge prompts, the reliability study, and a set of worked demonstrations across frontier models. The leaderboard sits downstream of the protocol, not the other way around. Anyone with access to the relevant APIs should be able to run it, audit it, and argue with it.

At launch#

The shipped artifact in December includes:

Code Inspector

Read each error code, toggle between the anchor passages that define it, and run your own passage against the rubric to see how the lens reads it.

Side-by-side output viewer

The same prompt rendered by four frontier models, with error codes highlighted inline where the lens flagged them.

Calibration story

A scrollable walk through how the codebook changed across calibration rounds, with the passages that forced each revision.

Reliability report

A full table of per-code Fleiss' kappa values from the five-annotator study, with a written walkthrough of the sharpest disagreements and what they revealed about the codes.

About the practitioner#

Alejandro Ashes has written seven novels in and around the grimdark register at alejandroashes.com, where he also writes the Eldritch Grimoire, a series on AI-powered dark fiction craft. He has twenty years of experience designing evaluation frameworks, and currently does rubric work for frontier AI labs. The Grimdark Lens sits at the intersection of those two practices.

Follow the work in progress on GitHub

A launch note will go out in December. Watch the repo for now.