Some Thoughts on Readability

Communication is everything in software development.

Unless you work on a team of one, it’s important that your code communicates easily to your team members. Even if you do work on a team of one, it’s important to communicate to your future self. Yeah, everyone has to read their own code once in a while, and unless your brain is a steel trap, you’ll forget stuff.

So do yourself and your team a favor. Choose names carefully. Here are some examples.

Example 1: Use enumerations instead of booleans

This example is C++. Let’s pretend you’re writing a parse() function that looks like this:

bool parse(const char *file_name, employee *e, bool validate_only)

That seems pretty clear, doesn’t it? Let’s break it down:

Inputs:

  • The path to the file to parse (called file_name)
  • The employee struct to populate with the parsing results
  • A flag to control whether to actually parse the file, or just dry-run (validate_only)

Outputs:

  • A true or false return value to indicate success or failure
  • The out structure, populated on success

So far so good? Yeah, probably.

Reading the code so far, the purpose of each parameter is very clear, but don’t make people read your header files if they don’t have to. Consider the code that actually calls this function. It would look something like this:

employee e;
bool ret = parse("/tmp/foo.txt", &e, true);
if (ret) {
   ...

Without looking back at the header file, can you tell me what this function does? For the most part, it’s obvious, but that true parameter is not clear. I’m also not happy with the boolean return value (though it is the lesser of the two offenders).

Don’t make me think so hard. If I stumble across the code above, I shouldn’t have to go hunt down a header file just to figure out what it does. We can make this easier on developers by changing those bools to enums, like so:

enum parse_result { success, failure };
enum parse_mode { validate_only, full_parse };
parse_result parse(const char *file_name, employee *e, parse_mode mode);

By making these changes, callers would see this instead:

employee e;
parse_result result = parse("/tmp/foo.txt", &e, validate_only);
if (result == parse_success) {
   ...

Instead of true, we read validate_only, and it’s immediately clear what this function is doing. The returned value is also very clear. There is no need to consult documentation or other header files to figure it out.

Example 2: Mutants

In Python, you can remove white space from strings with the strip() function. However, if I stumbled across code like this, I probably wouldn’t realize that it had a bug:

s = "   foobar   "
s.strip()
print s # What does this print?

That’s because the strip() function doesn’t actually strip whitespace from the string, but rather it returns a new string with the whitespace stripped.

In this case, I would recommend that strip() be renamed stripped() or tostripped() to make it more clear that it returns a new string.

For methods that mutate their instances, I recommend present-tense verbs like strip(), rename(), or clear(). But for methods that return modified copies of the instance, I recommend past-participle names like stripped(), renamed(), or cleared().

Qt4 got this right with their changes to QString. Under Qt3, QString had a method called stripWhiteSpace(), so bugs like this were common:

QString str = "   foobar   ";
str.stripWhiteSpace();
// BUG: Is str actually stripped now?

In version 4, they renamed this method to trimmed(), so now there’s no question:

QString str = "   foobar   ";
str.trimmed();
// The bug is more obvious now

Example 3: Be specific

When reading others’ code it is common to come across words like data or ret or node. While there are exceptions to this rule, these words usually aren’t specific enough to communicate your intent.

For example, let’s say I’m writing a Python program to run Linux commands and store their results in a map. Perhaps I make a function like this:

def run_commands(commands):
    data = {}
    for command in commands:
        output = subprocess.Popen(command,
            stdout=subprocess.PIPE).communicate()
        data[command] = output[0]
    return data

This just runs each command given to the function and puts the command’s output into a dictionary. Obviously, this is a tiny function, and as such, it may be an exception to the rule. However, if this were a more complicated function, perhaps one that executed each command in a separate thread and aggregated the results, data would not be very clear. Instead, I’d recommend renaming data to command_results. I might even go so far as to name it command_results_dict so it could be perfectly clear. Obviously, this rule can be taken too far. I would not recommend, for example, using a variable name like command_subprocess_stdout_result_dict. That would be insane.

Conclusion

So there you have it. Just a few pointers that are probably obvious to most of you already. We can all stand to clean up our communication now and then, so take this as a chance clean up the code you write today. I know I will.

Happy hacking!