Rebuilding my blog with machines, for machines

I rebuilt this blog recently. Not a redesign. The old posts are still here, same URLs, same markup. What changed is that every page now carries a full structured data graph and every element is annotated with microformats.

Structured data that pulls its weight

Every page gets a JSON-LD block in the <head>. Not a plugin. A 120-line Hugo partial that builds a @graph depending on what kind of page it is. The partial is mostly a big conditional: if .IsHome does one thing, if .IsPage and .Section "posts" does another, if .IsSection or .Kind "term" does a third. Each branch appends its nodes to a slice, then the whole thing gets serialized and dropped into a <script> tag.

For a blog post, that means four Schema.org nodes: WebSite, Person (me), WebPage, and BlogPosting with headline, dates, word count, tags as keywords, and a cover image. Breadcrumbs get walked from the page’s ancestors into a BreadcrumbList with position numbers. Tag pages emit a CollectionPage with an ItemList of the last 50 posts. The home page stays minimal: WebSite and Person only.

All of it is templated from Hugo’s page context. Dates come from .Date and .Lastmod, word count from .WordCount, tags from .Params.tags. If I add a tag to a post, it shows up in the JSON-LD without me touching the partial. No manual syncing, no forgetting to update the metadata block.

The markup is straightforward. What matters is what runs after:

def collect_types(node, out):
    if isinstance(node, dict):
        if "@type" in node:
            t = node["@type"]
            out.update([t] if isinstance(t, str) else t)
        for v in node.values():
            collect_types(v, out)
    elif isinstance(node, list):
        for v in node:
            collect_types(v, out)

def collect_ids_and_refs(node, ids, refs):
    if isinstance(node, dict):
        if "@id" in node and len(node) > 1:
            ids.add(node["@id"])
        if "@id" in node and len(node) == 1:
            refs.add(node["@id"])
        for v in node.values():
            collect_ids_and_refs(v, ids, refs)

These two recursive walkers drive the whole thing. collect_types finds every @type in the graph so the validator can assert a BlogPosting exists on post pages and a CollectionPage on tag pages. collect_ids_and_refs is the clever one: a dict with @id plus other keys is an definition, a dict with @id as its only key is a reference. ids - refs at the end catches every dangling pointer.

The main loop parses every HTML file in public/, regex-extracts each JSON-LD block, checks @context is present, then feeds everything through the walkers:

for html_path in PUBLIC.rglob("*.html"):
    blocks = LD_RE.findall(html_path.read_text())
    for raw in blocks:
        doc = json.loads(raw)
        if "@context" not in doc:
            errors.append(f"{rel}: missing @context")
        collect_types(doc, types_seen)
        collect_ids_and_refs(doc, ids_defined, refs)
    dangling = refs - ids_defined
    if dangling:
        errors.append(f"{rel}: dangling @id refs: {sorted(dangling)}")

73 lines total. It runs in CI on every push. A BlogPosting referencing #author when the Person node got dropped during a refactor: caught. A tag page missing its CollectionPage type: caught. The graph degrades gracefully in a browser, so without this gate you’d never notice until Google shows bare URLs instead of article cards.

Why a personal blog needs infrastructure

Structured data sounds like SEO work. It’s not, or at least that’s not why I built it.

The same pages carry IndieWeb microformats. Every post is an h-entry with dt-published, p-category, p-name, u-url. The home page has a hidden h-card with rel="me" links to GitHub, GitLab, Codeberg, and Mastodon. An IndieAuth parser can walk those links and verify that github.com/lvmbdv and blog.lvmbdv.dev belong to the same person. No central authority, no API key. Just URLs and <link> tags.

The webmention endpoint is wired but commented out. I haven’t set up the receiver yet. The markup is there, waiting, and when I do wire it up, every post published since this rebuild will support it retroactively.

None of this is visible if you’re reading normally. The JSON-LD sits in the <head>. The microformats live in class attributes on existing markup. The only time you think about them is when something breaks, and the validator makes sure you find out before you deploy.

I’m building infrastructure that will outlast whatever tool writes the markup.

Why the validator exists

Most of the template code was written by an AI.

It’s good at generating Hugo partials. It’s also good at silently breaking @id chains during a refactor. The dangling #author reference doesn’t throw an error. The browser renders the page exactly the same. The structured data is just quietly broken.

The CI gate exists because neither of us catches this stuff during review. I describe what I want, the agent writes a partial, I read the output. But I’m looking at the generated HTML to see if the page looks right. I’m not manually tracing every @id through a four-node graph. The validator does that for me.

I co-write posts with an AI too. That’s a separate problem with its own set of mistakes to catch, but the dynamic is the same: a tool produces output, a script checks the invariants, I handle what the script can’t. The infrastructure is the part I don’t have to think about again.

The structured data will still be there, working, long after I stop using whatever generated it. That’s the bet.