The story below is fictional. Of course, it would never happen on your team. Neither would it happen on my team. But we all know the teams where this can happen, don’t we?
A programmer sipped cold coffee while staring at a pull request. The code looked good, but something worried them. What was it, though? “I’m overreacting,” they thought. “Let’s get it deployed and go for a lunch.” They clicked the merge button and left the room.
They ate lunch. In the meantime, the build server ran all the tests and deployed the code. When the programmer returned, they saw the project manager talking on the phone. The PM had a pale face. A colleague yelled, “Does anyone remember how to revert a deployment? The example in our documentation doesn’t work!”
The only team member who remembered it was on vacation. They weren’t happy when the project manager called them to ask what they should do. The programmer’s complaint about an interrupted day off was their smallest problem.
The change introduced by the hungry programmer exposed the private data of every customer to every other customer account. People saw other people’s files and weren’t happy because they knew that it meant others could see their data too.
It turned out that the programmer forgot to send a user id parameter to a service retrieving the data. Without the identifier, the service returned all available information.
When the national Data Protection Regulator employees started their investigation, the Board of Directors wanted to scapegoat someone.
Was it the programmer’s fault? Was it the fault of the people who wrote the dependency service? Perhaps their code should return empty results when you don’t specify such a crucial parameter.
The programmers didn’t want to blame each other (imagine that). After many RCA meetings, they concluded that the error was caused by missing documentation. Nobody knew about the required parameter. The person who wrote the existing documents left months before the incident. So it was nobody’s fault. Right?
The CEO demands having the problem fixed. They must write documentation now. How should they do it?
How does good technical documentation look like?
Documentation - is the thing we like to blame when something goes wrong. But also the thing we don’t want to write or update.
I like to think that documentation is like a map of the system. Of course, “The map is not the territory.” Also, a documentation of a system is not the system itself. It would be unreasonable to expect to have every detail in the technical documentation.
After all, we use different kinds of maps for various purposes. A map showing the roads between cities will be useless for navigating inside a city. It’s the wrong level of detail. A sewer system map is useless when you want to use public transport. It has a different purpose.
Also, a map doesn’t include every detail. Not every copse gets a green polygon on a map. Even if adding such details helps fulfill the purpose of a map (“Turn right at the crossroad with four oak trees on the left side”), we omit them because reading such a map would be overwhelming.
Similarly, your technical documentation doesn’t need to look like the Encyclopædia Britannica. It makes no sense to describe every loop and conditional statement in a text form.
The recommended types of technical documentation
The technical documentation shouldn’t be a translation of a programming language into a natural language. If you created such a monstrosity, you wasted a huge amount of time and money. Nobody is ever going to read it. Also, I firmly believe your documentation is wrong. There is no way you included every detail and kept the documents updated at all times.
Because of that, I will focus on the level of details that makes sense to me.
A top-level view of the system is a short document describing input and output data. It briefly describes the data transformation within the system. Those descriptions shouldn’t be longer than 3-5 sentences.
Think of the top-level documentation as the “elevator pitch” for your software. It should take 20 seconds to read the entire description.
Why do you need it? It may be helpful when explaining what you do to the CEO, but also when programmers from other teams look for the right data producer for their new feature.
They shouldn’t need more than one minute to decide whether your code may help them or not. Of course, the other team will never know if they found the right program after reading a half-page summary, but it should be sufficient to decide whether they found the wrong one.
How will other teams learn how to use your code or the data you produce? Will they need to ask you? Read your code? Guess? They shouldn’t need to do any of this.
You always have an API. The input and output data is your API if you build a data pipeline. Because of that, the next document you need to write is the contract of your API.
In the document, we must describe, in detail, the format of the files you expect or produce. For example, if you use CSV files, the documentation should contain data types, column names, column order, information on how you use quotes, null values, etc.
Documenting the data gets a bit easier when you have Parquet or Protobuf files because the file format contains some of the information required to use it. But even in such cases, you need the contract documentation. How else will I know that the text parameter can’t contain any text I want but only one of three values because you use it as an enumeration? Parquet metadata won’t tell me that. You must.
Don’t forget to mention what happens in case of errors. How would I know that something failed? Should I guess? Count the number of values? I hope we all agree it would be ridiculous to detect failures in your dependencies by counting the size of returned data.
We all agree, but sometimes you see data pipelines producing empty files when they fail, and nobody thinks it’s weird. Don’t do that. Make the failures explicit.
Operational and troubleshooting documentation
Do you want to get woken up at 2 am on Saturday because your data pipeline doesn’t work and the on-call person cannot fix the issue? They will call you if you don’t tell them how to debug and fix the problems.
Every production system should have a troubleshooting checklist. In the checklist, we inform the unlucky on-call people what they need to check, where they can find the relevant information, and how to access them. They should get step-by-step instructions! Include screenshots of monitoring pages and highlight the relevant charts. Sleepy people make mistakes. Help them as much as you can.
Don’t include too much information in the checklist! They may make mistakes if the instructions overwhelm or confuse the people executing them. If your troubleshooting manual confuses the on-call people, they should call you at night every time something happens until you fix the problem with your pipeline and update the documents.
Also, if your on-call instruction says, “Switch it off and don’t worry. It can wait until Monday morning.,” the on-call people shouldn’t get a notification about the problem. They won’t be happy if you wake them up to tell them they can go back to sleep. Would you like that?
You need on-boarding documentation even if you aren’t hiring new people right now. No, you won’t write it later. You must prepare such documents before you need them. When a person quits, they won’t do a good job writing down everything they know to teach their replacement. They won’t care.
What should you include in the on-boarding manual? I would focus on describing the domain language and pointing to the relevant parts of code while explaining the domain. It’s sufficient to write only about the top-level abstractions. You don’t want to scare the new programmer.
The on-boarding documentation should help them when they wonder what you are talking about during meetings or when they don’t understand a domain concept. I assume you use the terms from the business domain in your code, so finding the relevant parts should be pretty easy. If people need a translation between business terms and code abstractions, you have a bigger problem than missing documentation.
Architecture Decision Records
Finally, we need the architecture decision records. They seem unnecessary. After all, we see the system’s current state in the code. Why do we need them?
We need them because the code doesn’t explain why we have made a decision and what alternatives we considered. What criteria did you use while making a decision?
Years ago, it was common to split all applications into microservices. Now, many teams spend time bundling the microservices back together. Who is right? We don’t know and will never know if the teams don’t document their decisions and what they want to achieve. Maybe both teams were right because they had different goals, and both teams achieved those goals. How can we know if they didn’t bother writing it down?
You need ADRs to communicate with people who join your team years after leaving the company. If you do a poor job documenting your decisions, the new joiners will look at your carefully crafted solution, and… they will refer to it as “the old crap that needs to be replaced.” Do you want that?
How to write technical documentation?
Writing seems easy. You sit down, and you write. However, to ensure that what you write is helpful to other people, I suggest following a few steps:
Why are you writing? Let’s write down the document’s purpose and what you want to achieve. What’s the scope? What do you want to include? What will you omit on purpose? Write it down. The document’s first one or two paragraphs should explain its purpose and scope. The readers must know what they should expect.
Decide how to achieve your goal Is it better to write a text document or prepare a diagram? Knowing what you want to achieve, you can decide what form works best. You don’t need to be consistent. If all your documents are long articles, it doesn’t mean you can’t draw a diagram. Also, you can mix the content within one document.
Write the outline, fill the gaps For technical people, the easiest way to write articles is to start with an outline of what you want to say and add the text with relevant details. An outline item doesn’t need to contain headers of subsections. A header is a different thing. You can add headers later or turn the paragraphs into a list. Removing the outline after you finish writing or moving the parts around is okay. It’s supposed to help you, not be a constraint to you.
Stop when you have achieved the goal Stop when you have finished. THere’s nothing worse than an author rambling because they feel they must tell every detail or hit a word count target. Have you finished? Stop.
Remove the clutter Go back to the first paragraph, where you described the goal and the scope. Read it. Now, reread the rest of the text. Does every paragraph, every sentence contribute to your goal? Did you get sidetracked and write about irrelevant things? Remove it. Remove everything that doesn’t contribute to the intent of the document.
If you feel bad about removing the content, move it to another document and link to it. If it doesn’t fit anywhere, but you feel proud of what you have written, consider publishing it on your company tech blog.
When should you remove documentation?
Should you remove outdated documents? No. Update them. If you can’t update them, mark them as obsolete and move them to an archive section. Those documents tell the story of your project. They explain the decisions you have made. Don’t remove them yet. Try to fix them first.
If you have moved something to an archive and nobody needed that for a year, feel free to remove such documents.
When should you update documentation?
You won’t do it every week. You won’t. Even if you think you can update all documents every week. You won’t do it even if you promise that to someone or yourself.
It is sufficient to update the documents when you make a significant code change and modify the architecture. I suggest scheduling documentation updates every three months. It’s good enough. Not perfect, but not terrible either.
Additionally, you should also update the documents when a new person joins the team, and you must explain to them the differences between the documentation and reality. Take some time to fix the documents when it happens.
Who should update the documentation?
That’s a tricky question. Many programmers won’t want to write text. Many programmers are so terrible writers they shouldn’t write the documentation. But don’t use this as an excuse. Every person on the team should write. Otherwise, they will never learn.
However, there should be only one person responsible for the final editing. Their job is to keep the documentation style consistent and easy to read. It’s better for the reader when the documents don’t look like multiple fragments copy-pasted from dozen of places.
The principles to follow while writing technical documentation
Finally, let’s set some rules to make your documentation the best:
Don’t blame the documentation The documentation doesn’t get outdated. You failed to update it. Take some responsibility for your actions and negligence.
“Nobody wants to read your shit” (Steven Pressfield) People don’t want to read. You can’t force them. Keep your texts short, focus on the problems, and avoid drifting off. Write things you would like to read!
Don’t explain the same thing many times If you need to repeatedly explain the same terms during meetings, create a documentation page about them. Include a link in the on-boarding documentation and tell people about the document during meetings where you explain the business domain.
Good is better than perfect Don’t waste time looking for the perfect documentation format. You can change it later. You write on a computer, not carve the letters in stones. Don’t overthink it. Documentation is just a tool. If it fulfills its purpose, it’s good, even if you made many grammar errors.