Introduction

Overview

Teaching: 5 min
Exercises: 0 min

Questions

What is software design?

What is technical debt?

How are software design and technical debt related?

Objectives

At the end of this module you should understand what we mean when we talk about software design and why good software design is important.

What is Software Design?

Before we start talking about software design, let’s clarify what that actually means and when you should start thinking about it. You might have written a couple of Jupyther notebook cells before, or a script or two, or maybe you’ve developed a package, or even a web application. All of these things need to be “designed” in the sense that you need to make decisions about where what code goes, how the code flows, or how the user will use your code. While there will likely be fewer design considerations going into creating a Jupyter notebook than into writing a complete web application, that doesn’t mean you don’t need to think about design when creating a few Jupyter cells. If your code is well-designed, it is easier to read, easier to maintain, and easier to sustain in the long-run; either by you or by someone else.

There are many aspects that feed into designing software and your design choices can affect other components of the software development life cycle. For instance, if you don’t modularize your code, it is a lot harder and potentially impossible to write well-defined tests. Throughout this module, there will be references to other modules with more specific details on a topic. Not every aspect is relevant to every project, but that doesn’t mean they shouldn’t all be considered at some point (even if it’s just to decide to focus on something else).

Technical Debt

“In software development and other information technology fields, technical debt […] is the implied cost of future reworking required when choosing an easy but limited solution instead of a better approach that could take more time.”

Source: Wikipedia - Technical Debt

Basically, there is not time like the present. If ones chooses saving time over creating a better solution that would require a greater time investment, one accumlutes technical debt. For example, it could be faster to hard-code the paths to files required by a script at the locations when they are needed rather than defining variables at the beginning of a script and use the variables throughout the code. While the first approach would likely be easier and faster at the time of writing the script, it would make it harder to change the file paths for the next project or dataset and require extra work to rewrite the code to enable it to be more flexible.

Technical debt can accumulate at every level of a piece of software from hard-coding strings, to modularizing code, to choosing a database that’s faster to set up but has limited storage capabilities. An important skill for a developer is to recognize when technical debt is accrued and to understand if the effort necessary to pay it down in the future is worth the time saved at present.

Technical debt and software design are interlinked as the better a piece of software is designed, the lower the technical debt typically is. Often a more flexible design takes more time to implement and can be more complex. Hence, it can be tempting to choose a simpler design or to choose to postpone reworking code in favor of quickly finishing a feature. In many cases, however, this leads to a greater time investment in the future.

The Different Layers of Software Design

Software can consist of many different layers. From writing a couple of lines of code to architecting distributed systems, in each layer the relationship between the different components need to be defined and implemented. For the purpose of this lesson, we will take a holistic perspective on software design and group the concepts and best practices into three layers as follows.

Layer 1: Instructions

This is the smallest unit of coding: writing instructions that the computer understands. In this context, software design refers to how “things” are named, the code style employed, and how code is broken up into functions and methods. While not everybody might agree that this falls under software design, the decisions you make on this level can have consequences for the whole software.

Layer 2: Structure and Organization

In this layer, we need to think about where we put the different parts of our code. Questions such as “who can access what functions” or “where does certain functionality belong to” are considered here. Classes or modules can be used to organize code, and considerations about their definition are made here.

Layer 3: Components and Services

On the highest level, decisions about how different applications and systems talk to each other and how they exchange information are being made. It can also include how the different components of an application are connected. We will not talk much about this layer, but if the applications you develop get to a certain point it should be something you think about.

Key Points

Software design can refer to two things: the structure and implementation of a piece of software, and the plan how to structure and implement a piece of software.

Technical debt refers to the future cost of reworking a solution that was chosen because it was faster and easier to implement instead of making it more flexible from the beginning.

When software is not well-designed, technical debt accumulates.

Instructions

Overview

Teaching: 5 min
Exercises: 0 min

Questions

What are some best practices and recommendations to write high-quality code?

Objectives

At the end of this module you should have a better understanding of best practices for writing readable and maintainable code one instruction at a time.

Layer 1: Instructions

In 2008, Robert C. Martin published his book “Clean Code: A Handbook of Agile Software Craftsmanship”. In this book he lists a set of best practices to write maintainable and sustainable code. We will talk about some of these best practices that are generally very useful. There are a lot of resources out there on clean code if you want to learn more, however, keep in mind that everything has a context and not every recommendation or best practices fits in every situation.

Naming Things

Geek & Poke - What every good system needs

Naming things is one of the hardest things in programming: you want to use names that are descriptive but not too long. It’s an art. What you should absolutely avoid are variable names like a, b, or c and function names like myfunction or do_this. There are a few exceptions, for example, it can be ok to call your counter in a for-loop i but even in those cases there is usually a better name.

Another good rule to follow is to use verbs for functions and methods, for example name your function calculate rather than calculation, or retrieve_data instead of data. Variable names on the other hand are usually nouns potentially modified by an adjective, such as language or color. If you have boolean variables, those are often adjectives or adjectives with the “is”-prefix such as blue or is_blue (this in part depends on the programming language you are using).

Lastly, the use of keywords as names should be absolutely avoided. There are few exceptions where it is ok to use a reserved keyword as a name, but those are few and far between. Some programming languages won’t let you use reserved keywords as names, however, some do (Python being one of them). For instance, in Python if you name a variable type, then the method type() will be overwritten and not be available anymore, which can lead in the best case scenario to your program erroring out and in the worst case to produce incorrect results.

Searchability

A consideration that should go into naming things should be how well the name would be found if someone searches your code. If you choose function names that have nothing to do with what the functions are doing, it is difficult to find the relevant pieces of code when one is not familiar with it. For instance, if a function that queries a list of temperatures and calculates their average is called return_value then searching for “average” or “temperature” won’t bring up that method. Similarly, if there are spelling mistakes in names, searches cannot find those either (e.g. if you call a method calculate_averge instead of calculate_average).

Consistency

Be consistent! Once you decide on how to do something, stick to it. For example, if you decide to call functions that retrieve values get_value (where “value” stands for the name of the variable the function retrieves) then don’t start calling them retrieve_value half-way through your program. This applies to all levels of development. For instance, if you choose to use type-hints in Python use them consistently throughout your project, not just in one file. Or if you choose to have one file per component then apply this strategy to your whole project. Inconsistent code is much harder to understand, which makes it a lot harder to maintain.

Indentation

Indentation is an important factor of code readability. So important that in Python, it is part of the syntax of the language. No matter what programming language you use, choose how you indent your blocks (e.g. will you use tabs or blanks, how many blanks will be one level of indentation) and then stick to it. This code:

function hello() {
console.log("Hello World!");
saidHello()
if saidHelloTwice(){
console.log("Goodbye")}
}

is much less readable than:

function hello() {
    console.log("Hello World!");
    saidHello()
    if saidHelloTwice(){
        console.log("Goodbye")
    }
}

If you use an integrated development environment (IDE) such as VSCode it will come with some functionality to help you with that. It pays off to make the time to figure out how to set it up in the beginning of your project.

Magic Numberes

Try to avoid magic numbers. Magic numbers are typically hard-coded numbers (or strings) that you use throughout your code, e.g. you might use Pi by putting 3.14 in multiple places of your code. These values should be put into constants, preferably at the beginning of your code or in a separate file. If you or someone else wants to change one value they should be able to go to one place and change it only once. You don’t want to hunt through all code files to find all the places that a certain number is used. Most languages have conventions on how to format constant names (e.g. all uppercase letters) to make it visually clear that a variable is a constant that shouldn’t be changed.

The Principle of Least Astonishment

The principle of least astonishment (or least surprise) basically says that a user (or another developer in the case of code) should not be astonished (or surprised) when using a specific functionality. For programming specifically this means that for example a function should behave according to their name or a variable should hold a value that makes sense given their name. To name a few examples, if a function is called calculate_average() it should actually calculate the average and not the mean, or if it is called get_age() then it should return the age and not the name of a person. Similarly, a variable called temperature_celsius should hold the temperate in Celsius not in Fahrenheit.

One Line Should Not Do Too Many Things

Try to keep your lines short(-ish) and let one line of code not do too many things. Except in the case of method chaining (obj.method1().method2().method3()), a line should typically do only one or two things. For instance:

door = street.houses[0].open() if street.houses[0] and (street.houses[0].is_inhabited() or street.houses[0].is_empty()) else street.move_in().door()

There is a lot going on in this one line, which makes it hard to understand. The following is much easier to read:

if street.houses[0] and (street.houses[0].is_inhabited() or street.houses[0].is_empty()):
   door = street.houses[0].open()
else:
   door = street.move_in().door()

Commenting and Documentation

Commenting and documentation are its own topic and deserve a lot more attention than this one short pargraph, but for completeness, here are a few recommendations.

Don’t comment the obvious. For instance, the following comment is not particularly helpful:

  # initializing all the lists
  cars = []
  drivers = []

It’s clear from the code what’s happening. The following edited version is better.

  # The following two lists keep the currently available cars 
  # and the drivers who need a car.
  cars = []
  drivers = []

Explain the why rather than the how. For example, instead of commenting that you are adding one to the counter, say why you are doing this (unless it’s obvious from your code).
Explain the things you might not understand in the future (an hour, a day, a month… from now) or that might not be clear to other people who read your code.
Explain your design decisions if it’s not obvious from your code (e.g. why did you use an integer flag and not a boolean).
Explain why other obvious solutions do not work or you didn’t choose them. Keep in mind that your code is not for professional developers and therefore a little bit more detailed comments might be justified.
Delete old code, do not just comment it out. When you comment out code that you don’t need anymore instead of removing it, you clutter your code making it harder to pick out the important comments.
Adhere to the standards of the programming language. This will on the one hand make it easier for other people who are used to the standard to read and understand your code. On the other hand, this will allow you to leverage tools such as the ones that are included with your IDE of choice.
Make sure your comments are correct. This might seem obvious but make sure that if you change your code after you commented it that you also adjust your comments. There is nothing more confusing than a comment that the next line does X when it actually does Y.

A word on self-documenting code

You might finds statements like the following on website and other programming best practices resources:

“Self-documenting code is code that doesn’t require code comments for a human to understand its naming conventions and what it is doing.”
Source: Hackbright Academy

“In theory, the code of a good engineer should be so clear and readable that he simply does not need any documentation.”
Source: multi-programming.com

“Self documenting code is defined as code that explains itself without the need of extraneous documentation.”
Source: anthonysciamanna.com

While it is true that your code should ideally tell the reader something about its purpose through its structure, and variable and function naming, there still need to be comments and documentation to make it understandable and maintainable. The intention of the programmer and the answer to the question “why was this code written the way it is” can in most cases not be conveyed through naming and structure alone.

Key Points

There are lot of easy to follow recommendations that make your code instantly easier to read and understand (and thereby maintain).

Writing high-quality code is the first step towards good software design.

Organizing Code

Overview

Teaching: 10 min
Exercises: 0 min

Questions

What does it mean to organize code?

What is a function?

Objectives

At the end of this module you should have a better understanding of what spaghetti code is and how to organize it better.

From Instructions to Structure: Organizing Code

What’s a Function?

A function is a set of commands that has a name by which the set of commands can be executed. For instance, let’s say you repeatedly want to print three variables called person_1, person_2, person_3 like this:

print("Hello ", person_1)
print("Hello ", person_2)
print("Hello ", person_3)

Instead of copying these three lines and paste them everywhere you want to use them, you can create a function for that and then just call the function.

def print_hellos():
   print("Hello ", person_1)
   print("Hello ", person_2)
   print("Hello ", person_3)

You can now whenever you want to print the hellos simply use print_hellos(). Functions are a great way to structure your code. They let you keep code together that belongs together and name it, which not only helps making clear what the code is for but also it makes it easily reusable.

Functions are a great way to structure your code. They let you keep code together that belongs together and name it, which not only helps making clear what the code is for but also it makes it easily reusable.

When writing functions, however, keep the following guidelines in mind:

Keep them small and simple: the shorter and simpler a function is, the easier it is maintain and to understand.
One function, one purpose: a function should only do one thing, e.g. don’t write a function that reads from a file, analyzes it, and then writes to a file. Instead, have one function to read from a file, one function to analyze the data, and one function to write to a file.
Prefer fewer arguments: the more arguments a function has, the harder it is to understand and maintain it.
Avoid side effects: your functions should be structured and written in a way that they do not produce side effects, meaning if you change a function only the function should be impacted. For instance, global variable use is notorious in regards to creating side effects. If your function changes a global variable, then it becomes likely that changes in the function impact other places of your code where you use the global variable
Avoid flag arguments: flag arguments are arguments that hold either true/false values or integers that control the flow of the function. If you pass a parameter (e.g. isRetired) to a function and the only point of the parameter is to use it in an if/else block to control what logic should be executed (e.g. get retirement rate or get the salary of a person) then there is usually a better approach (e.g. creating one function for retirees and one for people still working).

Keep it DRY

The DRY principle stands for “Don’t Repeat Yourself.” It means that if you have multiple lines of code that are duplicates of each other then they should be replaced by, for instance, functions. A good indicator that your code could be “DRYier” is if you copy-paste code from one place to another. Using the same code in multiple places by duplicated it means that if you find a bug you have to hunt down all the duplicates and fix it there as well. If you use a function instead, you only have to fix the function (in one place).

Spaghetti Code

Spaghetti code is code without any structure or modularization. Often that means there are no or few functions or classes to structure your code, but more importantly your code has no discernible structure. Spaghetti code is almost impossible to maintain once it reaches a certain length.

Configurations

Configurations should be all kept in one place. For instance if a program needs a few paths to file directories and some numbers (e.g. model parameterizations), then those should be ideally kept in one file named in a way that it is obvious what it contains (e.g. “config” or “configuration”). This way a programmer only needs to find one file to set up a program rather than hunting through the code to find all hard-coded paths and other configurations.

Keep It Together

Keep the parts of your code that belong together in the same place. Depending on your programming language that can be the same file, or the same class, the same module, or the same directory. You should use whatever means the programming language of your choosing provides to keep code that logically belongs together together. This makes it easier to make changes later on, as you don’t have to look through all the code to find the relevant pieces that need to be changed.

Whitespace

Whitespace is an often underestimated tool to create structure in code. Lines of code that belong together should be visually together. Similarly blocks of code that address different issues should be set apart. However, one empty line is usually enough to achieve a visual indicator for different blocks of code. More than one empty line typically makes it more difficult to read.

Know What’s Being Offered

Most programming languages offer a lot of utility functions and solutions for commonly encountered problems. While it takes a lot of time to learn every feature of a programming language, you should first check what solutions are part of the basic functionality of the programming language you are using before implementing your own solution. For instance, if you need to solve a common math problem, first check if your programming language has a math package and if it includes a function that does what you need before implementing your own.

Key Points

Functions can be used to structure code.

Code that is better organized is easier to understand and maintain.

Duplicate code should be avoided.

Structure and Organization

Overview

Teaching: 30 min
Exercises: 0 min

Questions

How can code be best organized to make it easier to understand and maintain it?

Objectives

At the end of this module you should have a better understanding of how the different parts of code can be connected in a way so your code is better maintainable.

You will understand what coupling means in software development and how it relates to maintainable code.

Layer 2: Structure and Organization

The Dependency Hell

Dependency hell refers to the situation in which your code depends on a number of other packages (dependencies) that themselves depend on other packages that depend on other packages, and so on. Some of your dependencies might use the same packages as some other dependencies but a different version of them. Or maybe the dependencies you are using are only available for an outdated version of the programming language you are using and that keeps you from upgrading your code. The more dependencies your code has, the more complex and harder to update it becomes. Therefore, you should carefully choose any package your code depends on.

When you choose to use another package, first check who maintains that package and when it was last updated. Is it actively maintained and has a community that reports bugs and makes pull requests? Is it potentially backed by a company? How long has it been since the last release and the last commit? Does the maintainer respond to issues created? By answering these questions, you will get a better sense of how actively maintained the package is. If there hasn’t been much activity in the last few weeks or months, it might be an indicator that the package has been abandoned or will be soon. In that case, you want to be careful about depending on it in your code. If there is not much activity but you still want to use a certain package, then you should be prepared to maintain a fork of the package yourself should it become necessary.

Code Coupling

In software development, “coupling” refers to how two pieces of code are connected with each other. Coupling happens on a scale from tight to loosely. The tighter two parts of a system are coupled, the more they need to know about each other.

Cleancommit.io defines loose coupling like this:

“In a loosely coupled system, the components are independent of each other. Each component has its own well-defined interface and communicates with other components through standardized protocols. Changes to one component do not require changes to other components, making the system more flexible and easier to maintain.”

Source: cleancommit.io

Tight coupling is the opposite of loose coupling, which means that when you change one component you need to change the other one as well to adapt it to the changes of the first component.

Coupling happens everywhere. Between modules, between classes, between layers, between components, and between whole systems. In a well-designed system, things that logically belong together will be kept together (cohesion) and the different components don’t know much about the inner workings of each other but communicate over clearly defined interfaces (coupling).

As an example, look at the following code and think about why this is tightly coupled. How could it be more loosely coupled? Then check the discussion below.

def addition():
    num_1 = float(raw_input("Enter Number One"))
    num_2 = float(raw_input("Enter Number Two"))
    addition = num_1 + num_2
    print addition

Discussion

The function first asks the user to input two numbers. It then adds the two numbers and prints the result. The means of retrieving the two numbers and then add them are tightly coupled as we would have to change this function if wanted the numbers to come from somewhere (e.g. a different calculation). This function can only be used in one scenario, when the user is supposed to enter two numbers that are then added. The following code, achieves the same but decouples the input retrieval from the adding step.
def addition(num_1, num_2):
   addition = num_1 + num_2
   print addition

num_1 = float(raw_input("Enter Number One"))
num_2 = float(raw_input("Enter Number Two"))

addition(num_1, num_2)
In this code, the function addition can be called with any two numbers independent from where those numbers are coming from. If we want to change the code so that the numbers are read from a file, all we need to change are the two lines in which num_1 and num_2 are defined.

How to know if something is tightly coupled?

You might be wondering how you can know if something is tightly coupled. In theory, it is fairly easy to find out, just ask the following question: If I want to replace component A, do I need to change component B a lot?

If the answer is yes, then your two components are tightly coupled. If the answer is no, then they are loosely coupled. There are degrees of tight coupling, some components are more tightly coupled than others. Similarly, some components are more loosely coupled than others. You won’t always achieve the loosest coupling possible, nor is it always recommendable. This is because, in many cases, coupling two components more loosely comes with the cost of more complexity. And the more complexity there is, the harder it can be to maintain your code.

For instance, look at the following piece of code, does it look familiar?

def addition(numbers):
    total = 0
    for i in numbers:
        total = total + i
    return total

def get_number(question):
    response = raw_input(question)
    num = 0
    try:
        num = float(response)
    except:
        num = 0
    return num

num_numbers = get_number("How many values would you like to add? ")
numbers = []
for i in range(0, num_numbers):
    numbers.append(get_number("Please enter a number: "))

answer = addition(numbers)
print "The answer to the addition is: %d" % (answer)

This is the same code from before that adds two numbers. However, it is as loosely coupled as it can be. The different pieces of code are not only loosely coupled to each other, but the code is also loosely coupled to its purpose. It doesn’t have to be two numbers anymore that are being added but can be however many numbers need to be summed up. However, the code got considerably longer and it is a lot harder to understand its purpose. Therefore, you should always ask yourself, do I need my code to be more loosely coupled or more generalized or is this the right balance between loose coupling and understandability?

Means of Decoupling

The goal of many software engineering best practices is to loosely couple different parts of a system. The following is a (non-exclusive) list of techniques that aim to decouple code (make it more loosely coupled).

Information Hiding
Abstraction
Single Responsibility Principle
Modularization
Software Design Patterns
Well-defined APIs
Configuration Management
Dependency Injection
Microservices

In the following, we will talk about the first three in a little bit more detail.

Information Hiding

Information hiding, or sometimes also referred to as encapsulation, describes the practice of “hiding” the inner workings of a piece of code from whoever calls it. The goal of this technique is to avoid that one has to change the caller when the callee is changed. If calling function A requires knowledge about how A is implemented, it not only tightly couples function A with the code that calls it, it also makes it more difficult to understand the code as looking at the function name and its parameters does not provide all the details needed to use function A.

Some programming languages (like Java or C#) provide means of hiding information that are part of the standard curriculum when the language is taught. Other languages (like Python) offer no or only limited support. Especially in these cases, it is important to learn the standard coding conventions of the language and follow them.

As an example for information hiding, let’s look at the following piece of code. There are two files, one that defines the function say_something and one that uses it. If you want to change the animal that makes the sound, you need to know that there is a global variable in sound.py that you need to change.

# sound.py
animal_type = "cat"

def say_something(sound):
    print("The {} says {}.".format(animal_type, sound))


# script.py
import sound

sound.animal_type = "dog" # violates information hiding principle
sound.say_something("bark")

The following piece of code hides the information about the global variable. Instead, a function parameter is added that you can use to set the animal. If the implementation of say_something_animal changes to for example set a global variable to the passed in value, then the call in script.py does not have to change at all. The internal workings of say_something_animal stay completely hidden.

# sound.py
def say_something_animal(sound, animal_type="cat"):
    print("The {} says {}.".format(animal_type, sound))


# script.py
import sound

sound.say_something("bark", animal_type="dog")

Object-oriented Information Hiding

More often information hiding is talked about in the context of object-oriented programming. In the following example, knowledge about the internal workings of the class are required to use the class:

class Animal:

    animal_type = "cat"
    sound = "meow"

    def make_sound(self):
        print("The {} says {}.".format(self.animal_type, self.sound))


a = Animal()
a.sound = "hiss"
a.make_sound()

As a result, we would be able to rename the variable sound without having to change the code that uses the Animal class. Instead, we can add a constructor parameter, which would allow us to rename the variable as much as we like.

class Animal:

    def __init__(self, animal_type="cat", sound="meow"):
        self._animal_type = animal_type
        self._sound = sound

    def make_sound(self):
        print("The {} says {}.".format(self._animal_type, self._sound))



a = Animal(sound="hiss")
a.make_sound()

Abstraction

Abstraction is an omnipresent topic in software development. Very broadly speaking, abstraction refers to the result (or sometimes the action itself) of making something less detailed. In programming specifically it refers to removing implementation specific details. For instance, take a math package that offers a function or method to calculate the average of a list of numbers. First it needs to sum up all the numbers, then count how many numbers there are, before dividing the sum by the count. When you use the average function, you are using an abstraction, because all you do is call average(). Well, in theory at least, but we’ll get to that in a minute.

First, let’s talk about why it is important to talk about abstraction. There are really two reasons. First, every time you program, you create abstractions. The better defined these abstractions are, the easier it will be to maintain the code. You should have a plan for what functions or methods your code will provide. Ideally, the abstractions you create will be consistent and make logical sense (principle of least astonishment!). For instance, if you develop a package to provide domain-specific math functions, then that package should not contain functionality to do string manipulations.

Second, you are using abstractions every time you program. When you use a module provided by your language of choice or developed by someone else, you are using their abstractions. This means that there is a lot that’s going on under the hood. When using an abstraction, you are trusting that the abstraction is doing what it should be doing and what you think it should be doing. And here is the thing, when earlier it said you don’t need to know the implementation details and that it does not matter to you, that’s not the complete truth. In fact, it does matter because a) you depend on it doing what you expect it to be doing and b) abstractions can leak.

Leaky Abstractions

The “Law of Leaky Abstractions” was coined by Joel Spolsky and states that:

All non-trivial abstractions, to some degree, are leaky.

Spolsky was a program manager for Microsoft Excel in the 90s, he co-created StackOverflow and Trello, and had a blog Joel on Software where he wrote about programming. Some of his blog posts were also published as a book and are still worth reading.

But what does it mean that all abstractions are leaky? What Spolsky is describing with this law is the fact that as soon as an abstraction is complex enough, the complexity of certain implementation details that abstractions hide bubble up to the higher layers. To use a very simple example, consider the following function that divides two number and subtracts one from it:

def divide_minus_one(num_1, num_2):
	return num_1/num_2 - 1

If you use this function and pass in 0 as the second argument you will get a ZeroDivisionError. This means that the implementation specific details (that num_1 is divided by num_2) cause an error that is passed on to the next layer (your code). The abstraction is leaky. To be able to avoid this error, you need to know the implementation details of the function to then handle the division by 0 case in your code.

Examples of Leaky Abstractions

Scientists in Hawaiʻi have uncovered a glitch in a piece of code that could have yielded incorrect results in over 100 published studies that cited the original paper.

Source: Vice.com

In this case, a Python script used for chemistry calculations yielded different results depending on the operating system. Operating systems sort files differently by default (some use alphabetic order, some use the creation date, etc.). The Python script did not take this into account and just used the list of files returned by Python’s functions to get a list of all files in a directory without any further sorting. Since the algorithm relied on a particular order of the files, it returned different results when run on an operating system that sorted files differently. The specifics of the operating system leaked through the Python abstraction and into the script.

Another example is floating point arithmetic. When we are using Python, we typically use the decimal system for calculations. However, numbers are internally stored in a binary system (consisting of 1s and 0s). Not all decimal numbers can be represented as binary numbers. This is similar to not being able to represent certain fractions like ⅓ as a decimal number. 0.1 is a decimal number that can’t be represented as a binary number (it would be a never ending number). This leads on the one hand to rounding errors when your number get too small, on the other hand you might encounter some special cases like the following:

>>> 0.1+0.1+0.1 == 0.3
False

In this leaky abstraction, how floating point numbers are represented in the system leaks through to your Python code.

The Issue with Leaky Abstractions

Leaky Abstractions are really tricky. First of all, you need to know how an abstraction works to be able to handle a leaky abstraction. Often leaky abstractions lead you down the rabbit hole of implementation details in the search for the root cause of an issue. More importantly though, it can be near impossible to know when an abstraction might leak unless you know the implementation specifics of the abstraction. In the case of the chemistry script, unless you have run into the issue of file sorting before or are a very careful reader of documentation (and even that won’t always save you), there is no way of predicting that there might be an issue. This problem is compounded by the fact that the number of existing abstractions increases constantly. The more packages and frameworks there are to make the life of a developer easier, the more abstractions there are, which means the greater the potential for a leaky abstraction.

Single Responsibility Principle (SRP)

“A class should have only one reason to change.”

Robert C. Martin (2003), Agile Software Development: Principles, Patterns, and Practices

Where you could replace “class” with “module” or “function.” This principle is also expressed at: “Gather together the things that change for the same reasons. Separate those things that change for different reasons.” Basically what this means is that the things that belong together should be together. For example, if your module is reading and writing CSV files, then it should not also do math calculations. The only reason for your module to change is if the format of the CSV files changes, for instance.

While this in theory might seem straightforward, it can get pretty tricky when you have functionality that depends on many different interconnected factors. Sometimes, the question of what qualifies as a “reason to change” needs to be reframed to not introduce too many new complexities in an effort to reduce the “responsibility” of a component.

Example of Applying the SRP

Look at the following example and think about what the different responsibilities of the class are.

class Student:

    def register_student(self):
        # some logic

    def calculate_student_results(self):
        # some logic

    def send_email(self):
        # some logic

In the above example, the Student class has three different responsibilities: registering students, calculating the results for a student, and sending an email to the student. This means that it also has at least three reasons to change: if students need to be registered differently, if the results of a student need to be calculated differently, or if emails need to be sent out differently.

The following code separates out those responsibilities into different classes.

class Student:
    def set_address(self, address):
        # do logic

    def set_email_address(self, email):
        # do logic

class Registrar:
    def register_student(self, student):
        # do logic

class EmailService:
    def send_email(self, student):
        # do logic

Now, each class has only one responsibility. The Student class is used to manage student data. The Registrar class’ responsibility is to register students (and potentially unregister them). The EmailService sends emails. A side effect of applying the SRP is that each class can now potentially be used in a different context (e.g. the EmailService could be extended to send emails to faculty).

Key Points

Dependencies need to be managed as well as the code itself.

Coupling refers to what extend the components of a piece of software are connected. Ideally, the components are loosely coupled so that if one component is changed, the others do not have to be changed as well.

There are many techniques and best practices to achieve loose coupling, information hiding, abstraction, and the Single Responsibility Principle are some of them.

Components and Services

Overview

Teaching: 5 min
Exercises: 0 min

Questions

How does software design apply to components and services?

Objectives

At the end of this module you should have a basic understanding of how to design components and services.

Layer 3: Components and Services

We will not talk much about this layer as it highly depends on the type of software you are developing. However, the basic principles we’ve talked about before apply here as well.

The responsibilities of your components or services should be clearly defined. Each component or service should ideally handle a specific task (e.g. analyze something or handle notifications, but not do both). However, especially if you opt to set up distributed services (which means that you have multiple applications running to handle different aspects of your overall goal), it will greatly increase the complexity of your applications. Sometimes, it’s the better (or only) choice, but sometimes keeping it all in one application is preferable to avoid too much complexity.
There should be clearly defined interfaces between components and services. This means on the one hand that they should be well documented, but on the other hand also that there is a limited number of how a component or service can be interacted with. If there are 5 different ways of calling a service with only slight or no obvious differences it makes it a lot harder to understand what the service does and how it should be best used.
Input and output formats should be clearly defined and ideally use common standards. What this means is that if there is a widely-accepted format that you could use for your data, then use it rather than creating your own. It will make it easier for other people to use your code because they might already have data in the required format or at least are familiar with it. It also means that the output of your code will be easier to use in other programs.
Code should not be copy-pasted from one component to another. If you find that two components need the same code, then a third component or package should be extracted with the common code. Otherwise, the likelihood of creating inconsistencies is a lot higher when you for example change one component but don’t apply the change to the other.
Your components or services should be loosely coupled, which means pretty much the same as it means when we talk about loosely coupled classes or modules. The components or services should know as little as possible about each other’s implementation, and if service A is changed, service B should not be required to change as well. Depending on the type of application you are developing and your use case, there are different technologies and concepts to achieve that such as event buses, microservices, web services, or Functions as a Service.

As a general rule, it’s typically a good strategy to start with the KISS principle (keep it simple stupid) and then increase the complexity as it becomes necessary. For instance, if you build a web application to present data and you don’t know yet how many users you will have or how much data, start with one app on one server. If it turns out that there are too many requests for your application to handle, you can then start thinking about extracting parts of the code into their own services. However, if you already know that you’ll have enough requests to potentially overwhelm one application, then you should design your system in a way that will make it easy to scale it when needed.

Key Points

The basic principles discussed before apply for components and services as well.

Loosely couple your components and services.

Use standards when possible.

General Recommendations

Overview

Teaching: 15 min
Exercises: 0 min

Questions

What are some general recommendations to write better-designed software?

Objectives

At the end of this module you should be able to describe some general recommendations to consider when designing software and writing code.

Some General Recommendations

The following lists some general recommendations to achive well-designed software. They are rather broad recommendations and can be applied to all layers and many scenarios.

Code with Intentionality

What this means is that every line of code you write, should be there because it is needed, not because that’s how you have always done it, that’s how you found it somewhere, or that’s how it is in the example. You should understand each line you write and also understand the consequences of it. To give a few scenarios that often result in coding without intentionality:

We all find solutions and examples on the internet and like to copy-paste those because they solve a problem we encounter. However, make sure you understand what you copy-paste. What does the code actually do that you integrate into your code base? Do you understand what happens in each line and in any given scenario?
Similarly, if you use an LLM to generate code, make sure you understand the generated code. LLMs are a great tool but they are a recipe for disaster if you blindly copy their solution without understanding it.
IDEs are also a great tool that can generate code for you. But like all the other scenarios, don’t just use IDE generated code without understanding it.

As a concrete example, consider exception handling. If you catch an exception in your code, why are you doing that? What happens after you catch an exception and does that make sense? Is the rest of the code still executed and if so, would it still be possible to get sensible results? If not, then maybe you should stop the execution at that point. In contrast, if you decide not to catch an exception, what does that mean for your code? WIll it error out? If so, does that make sense or would it still be possible to get valid results if you would catch the exception and continue?

Another example are dates. Dates are often used without considering all the intricacies that they come with. When you display a date, which timezone do you display? UTC? Local time? Your time? And do you show the timezone when displaying it? And if you store dates, do you store which timezone they are in? And does it matter?

Consistency is Key

No matter what aspect of your code, consistency is always a good thing. Pick a code style and stick to it. Some languages like Python come with a recommended code style, other languages have a number of code styles in use. Google has their code style guidelines for different languages published if you want to see some examples. Be consistent in how you name variables, functions, classes, methods, etc. If you deviate from your naming patterns make sure it’s a conscious decision and has a good reason. It will make understanding and navigating your code a lot easier.

Consistency also applies to the structure of your application. The more patterns repeat, the easier it is to understand an application. For example, if you use a manager class to manage one type of object, make sure to also create manager classes to manage other types of objects. If you do it one way in one scenario and then a different way in another scenario it becomes easily confusing.

Documentation

There are many different types of documentation. Not all documentation is or has to be written by the developer and not all documentation is for the developer. Therefore, depending on your project, the requirements for documentation might change with every project as well as over time. The following list lists some common types of documentation.

Requirements documentation
Architecture/design documentation
Technical documentation
- In-code documentation
- API documentation
- Development setup
- Test documentation
Installation documentation
End-user documentation

For many if not most researchers, providing all these types of documentation for a project is not realistic. And often also not necessary! If the code is meant for other researchers to run on their data, we might not need elaborated test documentation or API documentation. However, often at least some aspects of each type of documentation should be written down. Maybe there is no full-blown development setup documentation needed, but at least it should be documented what the dependencies are that need to be installed before the code can be run. Similarly, a full requirements documentation might not be needed but it should be documented what problem the code is supposed to solve.

One way to decide what documentation to write and how much is based on the following two questions:

What is most helpful to yourself?
If someone else would help you, what would be most helpful to them?

If you take these questions as basis for what documentation to write, you’ll likely end up with a list that looks something like this:

Technical documentation
- In-code documentation
- Development setup
- API documentation
- Architecture/design documentation
Test documentation
Requirements documentation
Installation documentation
End-user documentation

First you want to make sure you understand your own code. In-code documentation (via comments) is your most powerful tool for this task. Next, you probably want to document how you set up your development environment. What programming language and version, what other dependencies are needed. We have discussed tools how this can be made easier (e.g. using Docker containers). A formal API documentation that documents how the different functions and/or classes and methods can be used will probably be most useful after that, along with an overview of the architecture, the different components of your software. The order of the last four will likely depend on the type of project and who the main users of your code are. Maybe one sentence like “run the test in this way…” will be enough as test documentation and maybe you already have a paper that documents the requirements. Adjust the list above as needed to your particular situation.

In summary, documentation enables yourself as well as other people to reuse your code even when you forgot the details or if you have moved on to different projects. It is a crucial part of software development.

Conventions

Many communities have certain programming conventions, for example regarding which code style they typically use, how they test software, or how code is documented. Communities can, for example, be domain specific, programming language specific, or technology specific. Often you will be part of several communities, e.g. the Python community and the computational biology community. Make sure you know the conventions employed by the communities you are a part of and follow the conventions unless you have a good reason not to (intentionality!). It will make it a lot easier for other people in your community to understand the code you write. Sometimes, there might be conflicting conventions in the communities you are part of, in that case it makes sense to consider which community is more likely to read your code or reuse it and use the conventions of that community.

Let’s illustrate this point with the beloved car example. There are certain things when driving a car that once you have learnt them you won’t need to have explained to you again because conventions are in place that it will be the same for all cars. You won’t need to read up on how to open a door, how to use the turn signal, or how to use the break. This is the same for all cars. No documentation is needed (although there probably is some in the manual). Other things however, like the entertainment system, vary from brand to brand, or from model to model. For those, you need documentation to ensure users can use them. And if you choose to deviate from the convention (let’s say you design a Tesla and think how doors open needs to be redesigned), then you definitely need documentation.

Key Points

Every line of code you write should have an intention, a reason it is being written and not just copy-pasted without being understood.

Be consistent in every aspect of your programming.

Document what is most important to you first and then go from there.

Follow the conventions of your community.