On Documentation

September 29, 2023

In my career I’ve seen lots of teams struggling, not only to get their software out of the door, but much more often (even if successful in the previous step) to have a decent level of documentation next to it.

In this article I’ll enumerate some misconceptions and myths that I’ve seen far too many times in the past 26 years I’ve been in this industry.

I’ll also go throught the various levels of documentation that software development teams should keep in their radars at all times. All of these levels are more or less important depending on the context of the software projects; it’s not the same to build embedded software for a pacemaker than to build a web application using some fancy enterprise framework. The size of the team also influences the types and depth of documentation that teams should focus on.

Also, this is all based on my admittedly limited experience; I haven’t worked at NASA, CERN, nor at a corporation with more than a thousand people in it. There are much bigger software endeavors out there that I haven’t been a part of.

Principles

The following are personal beliefs around documentation born out of my own experience; you might agree with, or not. If some or many of the points below sound controversial to you, so be it. I stand by them.

Bad documentation, including bad code comments, is far better than no documentation at all. This is not because of the quality of the documentation per se (after all, “bad” is quite a subjective and time-limited idea that doesn’t mean anything at all in the context of software artifacts.) This criteria stems from the following fact: when there is no documentation at all, there is no yardstick to evaluate the quality of the next iteration. The important is not that the documentation is “good” or “bad” because there’s no such thing; what there is, always, is room for improvement: if you don’t like it, change it. Otherwise, don’t complain about it. And if you are in an organization that does not allow you to change it, then move away.

Repeat yourself. The famous “DRY” or “Don’t Repeat Yourself” principle is great for your code, but terrible when it comes to documentation. Redundancy in documentation is good, simply because humans learn by repetition, and the only reason you have documentation is because humans must read it to learn about your software. That’s why you have it. Hence, yes, repeat yourself, and a lot. Repeat the same concepts again and again, in different places.

It’s everyone’s task to write docs. This one goes without saying, but for many engineers, “documentation” is a dirty word. Their arrogance makes them consider documentation writing as a lowly kind of work. It is not, and if you have engineers that take this stance, no matter how good they are at writing code, they should either change their attitudes or GTFO. Writing is thinking, and as a software engineer you’re paid to think; so, write. I’ve written about the aversion to writing on De Programmatica Ipsum:

Around 90% of the teams I have worked with in the past 22 years have never, ever, documented anything. Not a single wiki page, not a README file on top of a repository, not a single PDF file for the end users, not even a single UML diagram. Where they successful? Hardly.

Adopt DocOps. Documentation delivery should be automated, just like all other parts of your system. I’m a big fan of systems like Antora that can be stored on your favorite source code repositories, and then seamlessly integrated into any CI/CD system, including GitHub Actions or GitLab pipelines, so that your documentation is versioned and published automatically after every build of your software. Also, separate content from look and feel (another thing that Antora does superbly well.)

Don’t fear drift. It’ll happen anyway. Documentation drift (that is, outdated compared to the actual implementation) will happen whether you like it or not. There’s no automation that can help you solve this. If you modified the code, you must also modify the corresponding docs. And sometimes you even have to re-read and re-edit the whole documentation package every so often. Yes, it’s hard work. Stop whining and get used to it.

Not everything can be automated. This is of course related to the previous point. Very often in my career I’ve seen teams spending weeks or months to build some contraption that somehow should “automate documentation completely,” because humans can’t be trusted to fix the drift between code and docs. Such an idea costs more time and money than it would have to just sit down and correct the documentation in the first place. Let’s be clear here: such engineers have the wrong priorities. Their motivations are understandable, but misguided, and become a net liability for their organization. Most of the time, you just need to sit down and write the documentation. The rest is just brain masturbation that has no room in companies that aren’t the size of Google or Meta, which is most of organizations, anyway.

“When you can’t create, you can work.” This is a quote from American novelist Henry Miller¹. In the case of software engineers, it means that when no more algorithms come to their minds after hours of being in the so-called “zone”, it’s time to document what they’ve written during the day. Writing is a muscle; get into the habit of writing down your work every day, and your documentations will naturally flow after a while.

Documentation Supports

Let’s talk about the various shapes in which your documentation can come to life.

README Files

This is the first documentation support each and every one of your projects should feature. GitHub invented in 2008 the concept of displaying the README file of a project on its webpage, and since then this practice has spread to GitLab, BitBucket, Gitea, and many other similar systems. This, is, without any doubt, one of the biggest contributions of GitHub to the world of software development.

READMEs are the way projects show politeness; they introduce themselves, and they should include at least the following items:

Purpose: what’s this project for?
IDE-specific instructions: which one to use, how to open the project, which knobs and switches to use to get started.
Build guidelines, including:
- Mandatory dependencies.
- Versions required.
- Optional tools.
- Command sequences.
Quick user’s guide: how to use this software?
How to run tests (on the terminal and on the IDE).
Any required legal information (licenses, rights, obligations, etc.)

For the sake of brevity, README files can point to wiki pages (more on this later) that expand on these subjects.

Code Comments

This is another fundamental element of software documentation. There are lots of schools of thought around this subject, including a whole chapter (32) in Steve McConnell’s “Code Complete, Second Edition.” I defer the reader to his exposition, clearly worthy of praise.

As a personal trait, I tend to go overboard with comments, usually directing their tone and contents to a future version of myself. I do not think there are useless comments. And if I don’t like them for any specific reason, I change them (hopefully for the better) even if I’m not the original author.

There is a particular situation, however, where code comments have become life savers for this author: platform-specific quirks. Every time a piece of software had any tipe of customization required by the different platforms where the code had to run (operating systems, browsers, mobile devices, etc.), comments have always been the best way to document such deviations from the norm.

Serve as examples the specific cases of embedded assembly code in C++ applications, the use of #ifdef-type of macros on various programming languages, and of course, specific web browser quirks, particularly during the times of the battle between Internet Explorer versus Netscape and later Firefox.

Wikis

I remember back in 2004 I shocked my (admittedly conservative) peers when I proposed to set up a wiki for our team; to say that I got quite a bit of pushback is the understatement of the year. I persevered, and we finally got a Confluence license… and within hours everybody had poured tons of knowledge in it. It was one of those “why didn’t we do this before” kind of moment, and ever since, I could tell whether a company was going to succeed or fail by a simple fact: do they have a wiki or not? It’s like an IQ test for organizations.

These days, no need to pay a license of Confluence; you have a wiki bundled into most project management SaaS and self-hosted software; GitHub, GitLab, Gitea, Redmine, they all have an integrated wiki, usually based on Markdown or some other text-based markup language. So no need to say anything else; just start using it.

Unit Tests

Unit tests are a great documentation mechanism; it’s one of the best, actually, particularly for low-level components in an architecture, or particular APIs, and to understand how they work just by running them.

And if the unit tests have code comments on them, that’s even better. And if there are wiki pages talking about the tests, now that’s fantastic. Remember, redundancy is your friend.

API Documentation Comments

Most mainstream programming languages offer some kind of API documentation system, used to decorate functions, classes, and whatever constructions you need. These can be later easily extracted using some command-line tool, and put together into a nice website.

The quintessential tool at the origin of such an idea was Doxygen, but pretty much every mainstream programming language allows you to do this nowadays (either as a built-in feature, or with a community-provided tool). If your ecosystem offers such a thing, you should definitely add those API comments and then set up a delivery pipeline to publish them for the required teams to see and use.

User-level Documentation

In order to decide the contents and format of this piece of documentation, you should ask yourself: who are the users of your software? All of the conversation, process, formats, and anything else required to create a user-facing documentations derives from that initial question.

The important parts of this type of documentation are, in my opinion, the following:

Content: talk to your users in their language; make it easy for them to read your text.
Format: provide this documentation in the canonical formats: HTML, PDF, and EPUB are the bare minimum. man pages are also a nice touch, depending on your target audience. Good news: Asciidoctor generates all of these formats.

Maintenance Guide

This is a kind of documentation that I used to provide to my customers during the days of my company akosma software (2008-2013). With every application I delivered, I provided a standalone documentation bundle (in PDF and EPUB formats) explaining in detail various aspects of the project (architecture, code organization, IDE settings, etc.) and how to maintain it in the future. Needless to say, I haven’t seen anything similar anywhere else, neither before or after I had my own business.

The idea was for my customers to be completely and totally able to maintain the application after I was gone. Some of them only hired me for the first release, some would come back for subsequent versions; in any case, the maintenance guide was always useful².

The inspiration of this maintenance guide came from my father, who is an architect in Buenos Aires specialized in the design and constructions of single-family houses. For every one he builds, he provides his customers with a maintenance guide (in paper format, in the shape of a large binder with typed or hand-written pages) explaining all kinds of details, from the color hues of wall paint, to the recommended brands of plumbing and electricity supplies to be used in case of repairs.

Formats

In what formats should teams write their documentation? First of all, documentation should be stored in textual formats, to enable DocOps workflows; this excludes any use of binary supports such as LibreOffice, Word, or other atrocities.

Most teams start with Markdown, and that’s usually fine. It’s supported everywhere and there are excellent tools to work with it (I’m particularly fond of Typora, for example.)

But Markdown is not suitable for complex documentation projects. I strongly recommend using a more evolved markup language, in particular AsciiDoc, and even more specific, with the Asciidoctor toolchain.

Asciidoctor provides documentation authors with various invaluable features that Markdown doesn’t support off-the-box:

Built-in page includes.
Syntax highlighting for code blocks.
Integration of mathematical equations.
Support for text-based diagrams.
Generation of HTML, PDF, EPUB, and man pages.

It is also natively supported on GitHub and GitLab, which means that even your README files can be written in AsciiDoc format; just use the README.adoc filename and you’re done. The Antora documentation system I’ve mentioned previously in this article also stems from the same team that brought us Asciidoctor.

Last but not least, I tend to avoid Textile, reStructuredText, and other formats, and if all else fails, use Pandoc to convert everything to AsciiDoc.

Diagrams

If at all possible, I recommend avoiding proprietary tools and binary formats for your diagrams and to use either (or both) of the following formats instead:

Standard Vector Graphics, or SVG; for this, Draw.io and its associated Visual Studio Code extension are priceless. Store your files as filename.drawio.svg and the Visual Studio Code plugin will load them for you on the editor, automatically.
Text-based formats, for which I recommend using a self-hosted copy of the Kroki application, which supports pretty much every text-based format you can imagine: BlockDiag, BPMN, Bytefield, C4, D2, DBML, Ditaa, Erd, Excalidraw, GraphViz, Mermaid, Nomnoml, Pikchr, PlantUML, Structurizr, SvgBob, Symbolator, TikZ, UMLet, Vega, Vega-Lite, WaveDrom, WireViz, …

Text-based diagrams can be trivially versioned on any source version control system, and the SVG format guarantees beautiful renderings and impression in every medium, at every resolution.

Fancy Delivery

Of course, most of the documentation tools I’ve mentioned in this article are web-based. The web offers a fantastic delivery medium for documentation.

These days, however, some new options are appearing, such as the integrated Backstage platform, adopted by Red Hat in their Red Hat Developer Hub system. They offer a powerful mechanism to deliver documentation to your users, and it even considers “technical writers” as an integral part of software development teams.

Conclusion

This was a very long article, so thanks for sticking with me until the end. There’s a lot more that could be said, but this article summarizes my thinking around documentation. I hope these ideas will be useful to you too.

I wrote an article on De Programmatica Ipsum with that title related to freelancing. ↩︎
Yes, as counterintuitive as it may sound, my objective was not to lock my users down in working with me. Some of my customers understood the benefits and hired me anyway; those were the best customers I’ve ever had. ↩︎