// field note·2026-02-28·3 MIN READ·626 WORDS//signal

Field note: February

Two outages, one apology email, and a surprisingly good week off.

February had two production incidents. Both were our fault. Both were fixable. One took eleven minutes to resolve; the other took two hours and cost us three customers we'll probably never get back. I'm more interested in the differences than the similarities.

The eleven-minute incident had three things going for it: a clear alert that named the problem, a runbook we'd written in December and actually maintained, and an on-call engineer who'd been through something similar before. The two-hour incident had none of those. The alert fired on a symptom, not a cause. The runbook was out of date by four months. The person paged had never seen this failure mode before. By the time we understood what was wrong, the damage was done.

The lesson isn't "write runbooks" — everyone knows that. The lesson is that runbooks decay and almost no one schedules the work to refresh them.

What I shipped

Trayd incident runbook refresh — after the February outages, I blocked a full day and went through every runbook we had. Rewrote four of them, deleted two that no longer applied, added three new ones. Set a quarterly calendar reminder to do it again.
Apology email to affected customers — I wrote this myself. Not a template, not marketing's version. Direct, specific, no passive voice, no "we apologize for any inconvenience." Here's what broke, here's how long it was broken, here's what we changed so it won't happen again. Six people replied to say thank you. That's not nothing.
A week off — I took a week off in the second half of February. First real week off in eighteen months. I didn't work. I read, walked, and ate breakfast at a normal pace. The product did not collapse.

What I read

The Phoenix Project — Gene Kim, Kevin Behr, George Spafford · The fiction format is dated but the incident response patterns in chapters 14–17 are still the most useful compressed version of "why fires keep happening" I've read.
Accelerate — Nicole Forsgren, Jez Humble, Gene Kim · Chapter on change failure rate. Our change failure rate in February was higher than it had been in six months. This book gave me the vocabulary to say why.
How to Take a Vacation as a Founder — not a book, a blog post I can't remember the URL for. The advice: tell your team three weeks in advance, write down the ten things only you know, give someone else the decision rights for the week. I did all three. It worked.

What I noticed

The two-hour incident happened because we had a dependency we didn't know was fragile. A third-party data sync had been running reliably for eight months, so we stopped watching it. The moment it broke, we had no context — no metrics, no history, no prior examples. Reliability without observability is just luck that hasn't run out yet.

The week off confirmed something I'd been told but hadn't believed: the stuff I thought only I could handle mostly handled itself. The two things that actually needed me waited. The rest resolved without my involvement. I came back to a list of decisions I'd been holding up for want of focus, made them in an afternoon, and closed the backlog. The week off made the week after more productive than any week I'd had in the previous month.

The apology email is still the thing I'm proudest of from February. Transparency isn't a communications strategy. It's a choice about what kind of company you want to run.

February score: 6/10. Two incidents is two too many. The runbook work and the week off mean March will be better. The apology email was right.

▸ READ NEXT

curated by signal · not by algorithm

// filed under //signal · field_note · 2026-02-28

// related transmissions

2026-05-31

Field note: May

Smaller weeks, longer essays, and the meeting I should have skipped.

2026-04-30

Field note: April

What I shipped, what I read, what I noticed in a month that did not go to plan.

2026-03-31

Field note: March

The month I rewrote the same paragraph nineteen times and finally let it go.

2026-01-31

Field note: January

Picking the four bets that would shape the year, and writing them on a wall.

// share this transmission

Hacker News LinkedIn Email

// discussion

What I shipped

What I read

What I noticed

Field note: May

Field note: April

Field note: March

Field note: May

Field note: April

Field note: March

Field note: January

Get the late-night email.