With self-documenting code, do you still need documentation?

TLDR: The answer is yes.

But let me elaborate further for you.

What self-documenting code is for?

The first argument on why self-documenting code is important, simply because some developers uses obscure names for their function, variables, and even files. Most of them hides with the reasoning of the infamous "2 hard things in computer science", in which are (1) cache invalidation, (2) naming things. But if you think very carefully, naming things isn't that hard. Just write stuff that are close to the users, if the users said that it's a "product bundle", just type out "product bundle" in your code. It's that simple, and now you'd probably understand what ubiquitous language is. If you're still don't know what's a self-documenting code is, you should see this Stackoverflow post.

Proves that self-documenting code is enough

A question that rises after people uses self-documenting code is that, "do we still need to write documentation?". This obviously starts an argument. I'd like to quote an argument made by someone to prove self-documenting code is enough:

public class TwilioSMSSender : ISMSSender {
  async Task SendSMSAsync(
    PhoneNumber to,
    string text,
    CancellationToken cancellationToken
  ) {
    if (text.Length > 140) throw new MessageTooLongException();

    ....
  }
}

The code above argues some things:

  • From the method name, it's for sending an SMS (text message)
  • From the class name, it'll be sent through Twilio
  • From the interface name, the function usage will be same with other SMS senders, also expects the same blackbox behavior
  • From the PhoneNumber class parameter, the destination number must be a valid phone number
  • From the method guards, the maximum text length is 140 characters
  • One of the possible exceptions is MessageTooLongException
  • If self-documenting code is not enough, maybe your code is not self-documenting, or you just made happy-path only code.

They also made a note that there are so many other specification that can be documented there, such as:

  • How do you set the Twilio API key
  • Possible error messages from Twilio's API
  • Request timeout
  • Idempotency handling
  • How the service will scale

There are some other arguments on the Stackoverflow post I linked above, saying:

1) Only write comments for code that's hard to understand.
2) Try not to write code that's hard to understand.

By Loofer

What seems trivial for you to understand at the time of writing the code may in fact be very hard for someone else to understand later, even if that someone else is in fact yourself in a few months/years.

By Anders Sandvig

Why you still to write some documentation

I would argue that self-documentation code is not enough, not because I only code happy-paths and not because I always write code that's hard to understand. Sometimes on a certain programming language, there are times that we have to sacrifice readability in order to achieve some performance benefits. Let's take one dumb example for finding the absolute number from a certain value. The language I'm using is Go.

package main

func Abs(x float) float64 {
  if x < 0 {
    return x * -1
  }

  return x
}

It's easy to understand, right? To find an absolute number from a certain value, all we need is to just check if it's lower than one. If it is, then multiply it by -1. But, why does the official Go implementation for absolute number is this?

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package math

// Abs returns the absolute value of x.
//
// Special cases are:
//
//  Abs(±Inf) = +Inf
//  Abs(NaN) = NaN
func Abs(x float64) float64 {
    return Float64frombits(Float64bits(x) &^ (1 << 63))
}

Link to source code

I assume you have no idea on what Float64bits(x) &^ (1 << 63) does, and look, there are no written documentation either in comments or function declaration on what the code does. You'd have to ask ChatGPT, and hope it produces correct answer for you.

If your argument is that "we won't even see this code in our day to day life and even it does, it should be abstracted away", well, I hate to break this to you, but people who made the code above are the ones that create the programming language standard library for you. If you're on their shoes, what would you do? Would you just magically expect people reading the code to understand the entire context of the code, and magically understand what it does and what you're thinking when you wrote the code?

At this point, I'd argue that comments are still needed to:

  1. Pass on your knowledge and thoughts to future code readers
  2. Provide context and prerequisite knowledge to understand the code

Let me show you one more example from Salvatore Sanfilippo, the creator of Redis. Go on and read this code snippet.

# Send packets waiting in the send queue. This function, right now,
# will just send every packet in the queue. But later it should
# implement percentage of channel usage to be able to send only
# a given percentage of the time.
def send_messages_in_queue(self):
    if self.lora.modem_is_receiving_packet(): return
    send_later = [] # List of messages we can't send, yet.
    while len(self.send_queue):
        m = self.send_queue.pop(0)
        if (time.ticks_diff(time.ticks_ms(),m.send_time) > 0):
            # If the radio is busy sending, waiting here is of
            # little help: it may take a while for the packet to
            # be transmitted. Try again in the next cycle. However
            # check if the radio looks stuck sending for
            # a very long time, and if so, reset the LoRa radio.
            if self.lora.tx_in_progress:
                if self.duty_cycle.get_current_tx_time() > 60000:
                    self.serial_log("WARNING: TX watchdog radio reset")
                    self.lora_reset_and_configure()
                    self.lora.receive()
                # Put back the message, in the same order as
                # it was, before exiting the loop.
                self.send_queue = [m] + self.send_queue
                break

            # Send the message and turn the green led on. This will
            # be turned off later when the IRQ reports success.
            if m.send_canceled == False:
                encoded = m.encode(keychain=self.keychain)
                if encoded != None:
                    self.set_tx_led(True)
                    self.duty_cycle.start_tx()
                    self.lora.send(encoded)
                    time.sleep_ms(1)
                else:
                    m.send_canceled = True

            # This message may be scheduled for multiple
            # retransmissions. In this case decrement the count
            # of transmissions and queue it back again.
            if m.num_tx > 1 and m.send_canceled == False and not self.config['quiet']:
                m.num_tx -= 1
                next_tx_min_delay = 3000
                next_tx_max_delay = 8000
                m.send_time = time.ticks_add(time.ticks_ms(),urandom.randint(next_tx_min_delay,next_tx_max_delay))
                send_later.append(m)
        else:
            # Time to send this message yet not reached, send later.
            send_later.append(m)

    # In case of early break of the while loop, we have still
    # messages in the original send queue, so the new queue is
    # the sum of the ones to process again, plus the ones not
    # yet processed.
    self.send_queue = self.send_queue + send_later

Link to source code

I love to present this code as a code example to anyone without context, just because it's Python -- presumably the programming language with easiest syntax ever, and this is what most of people would probably code. The send_message_in_queue is so common for most people, it will obviously send a message in a queued fashion. If you haven't read the code yet, please at least do it once.

But, not every case of sending message in a queued fashion will be the same, the code above is for controlling an Arduino modem-device called LoRa SX1276. From the code above, you can understand how the code author think, how he handles early break with messages still available to send, how he handle cases where the radio is busy that we can't continue any further. The code above doesn't expect you to understand what a LoRa is, but the comments helps us understand the pipeline and the thought process behind making the code.

Comments are also useful for introducing a hack for a specific edge case on your code, and again, it's meant for whoever read the code in the future. Here's one of my code examples for that:

await using NpgsqlConnection connection = await _dbConnection.OpenConnectionAsync(cancellationToken);

IEnumerable<ChartCandle>? result =
    await connection.QueryAsync<ChartCandle>(template.RawSql, template.Parameters);

// Data with interval greater than one day (meaning one week, one month, and so on forth) has a bigger chance
// of error because of what they're stored at on the database. Having to add the hours by 7 solves the issue.
// 
// NOTE: At this point, I still don't know whether the database server has UTC+0 timezone or UTC+7 (Asia/Jakarta)
// timezone. Different SQL clients provides different result. Try doing `SHOW TIMEZONE` or `SELECT NOW()` on the
// database server to try it out yourself.
if (interval >= ChartInterval.OneDay)
    return result.Select(candle => new ChartCandle(candle.Date.AddHours(7), candle.Open, candle.High,
        candle.Low, candle.Last, candle.Volume)) ?? [];

return result;

Imagine I didn't put that block of comment there, and just imagine you're someone who continues maintaining the code, and I already resigned from the company long ago. What would you do? Will you do a full rewrite, knowing that this code looks very jank and has obscure logic?

What comments should I write

If having comments that explains your code sounds dumb for you, try to put something else:

  • State who (the person) from your company who asks for this change. Put a link that refer to the Jira or GitHub issue.
  • State the reason why you write the implementation like so. Explain your thought process.
  • State the prerequisite knowledge to have in order for you to implement the code.

In essence, you are expanding of the 5W1H method (who, what, when, where, why, how). If you can expand all of it, you can provide comprehensive explanation of what you wrote, therefore it helps communication, provides room for improvement (for you and future maintainers) and as I was saying before, it helps pass on knowledge for future maintainers.

Conclusion

Not having self-documented code is harming your entire code base, and therefore it harms your legacy. Not having any comments is also acceptable, but you would have to expect everyone understands you. If there ever was a knowledge shift between you and the next person coding your legacy, remember that you're putting a nightmare on them. Adding comments to your code does not make it worse, with the correct purpose, it will certainly help others.

And of course, I'd advise you to write comments, put your thoughts out there, help others understand your code more.


You'll only receive email when they publish something new.

More from Reinaldy Rafli
All posts