Python Coding Standards


Introduction

The Sibros Python coding standards serves as a general programming guideline for a variety of aspects such as style guide, conventions, and design. The guidelines extend beyond Python's PEP 8 style guide, and just like PEP 8, the guidelines only serve as recommendations.


Why Python?

Python is an excellent programming language that has multiple advantages over other languages.

Easy to learn

Python was designed to be written closer to English by simplifying the overall syntax; it generally takes less lines of code to achieve similar functionality in other languages like C or C++.

Great for prototyping

Because of Python's simplistic nature, it has a quick turnaround time to create programs for a wide range of applications ranging from internal/external standalone tools to simple scripts for automation.

Object-oriented

Python is an object-oriented language, so developers can take advantage of object-oriented design to create complex yet elegant designs.


General Philosophy

Readability vs. performance

Python should be written with readability as the utmost priority; optimization is secondary. Optimization should only be done if the problem being solved is proven to be a problem in the first place. See Premature Optimization for more details.

For example, let's say algorithm A has an execution time of 50 ms, and algorithm B uses some obscure logic to improve the execution time to 1 ms. Assuming algorithm B's obscure logic compromises readability, then algorithm B is not justifiable because a 50x performance gain in the order of tens of milliseconds is negligible relative to human perception.

The same argument can be used for memory optimization.

Nested indentations

Try to ensure blocks of code have a maximum of three (3) levels of indentation if possible (indentations from functions and classes do not count). If an algorithm requires an excessive amount of indentations, then consider breaking up the algorithm into sequential chunks. For example, instead of having an algorithm that does everything in a single for loop, consider dividing the algorithm into sequential, smaller steps: step 1: do this, step 2: do that, finally step 3: combine results.

Modularity

Modularity is a necessary design principle to writing complex yet maintainable programs. For example, if a file contains 500+ lines of code, then that's a good sign the file needs to be divided into modules. The same can be said for functions/methods, control flow (e.g. if/elif/else, for loops, etc.), and classes.

An extreme example of unmaintainable code is when a function has many nested if/elif/else blocks of code where each condition contains a significant amount of code (e.g. 100+ lines). At this point, the code can be compared to abstract art; readers will find it to be very difficult to identify the logical boundaries among the many lines of code. This scenario typically happens when modularity was not incorporated into the design and the program has to handle frequent requirement changes or support multiple variations of something.

Unit testing

Try to unit test all code. Any untested code might have bugs regardless of how trivial the logic is. The last thing you want is to have customers discover bugs in your program. Overall, unit testing is an essential discipline that has multiple advantages.

  1. Unit testing encourages modular design

Unit testing encourages modular design by encouraging developers to divide large programs into small building blocks which ultimately makes the overall program more testable. It's more preferable to test small, focused components (functions, classes, modules, etc.) over a large monolithic component full of complexities.

  1. Unit testing encourages developers to design testable APIs

In order to test an API, the API has to be testable. In other words, an API can be wrongfully designed such that it is not easily testable.

Example

Imagine that you are writing a unit test for a progress bar class that is responsible for displaying progress percentages to a GUI widget.

class ProgressBar: def display_progress(self): """ Bad: This method is not easily testable """ # Displays progress percentage to a GUI widget # Progress depends on an external source progress_percent = self._external_source.progress self._view.set_value(progress_percent)

The problem is that the class is not easily testable. How do you validate the behavior of the class ProgressBar for percentages ranging [0, 100]? How do you simulate the progress percent? What if there is a corner case bug that happens when the percentage is exactly 100 percent?

To make the class more testable, the user should be able to easily provide inputs (to simulate cases) and read the outputs (for validating the behavior for each case). Mocks and stubs can also help test behaviors that cannot be easily simulated.

class ProgressBar: def display_progress(self, progress_percent: int) -> bool: """ Good: This method is easier to test :param progress_percent: Progress percent to display; valid values: [0, 100] :return: True if successful """ # Displays given progress percentage to a GUI widget success = self._view.set_value(progress_percent) return success from dataclasses import dataclass def test_display_progress_succeeds_within_0_to_100(): progress_bar = ProgressBar() @dataclass class TestData: progress_percent: int expected_result: bool test_data = [ TestData(-1, False), TestData(0, True), TestData(100, True), TestData(101, False), ] for data in test_data: assert data.expected_result == progress_bar.display_progress(data.progress_percent)
  1. Unit testing is automation friendly

By definition, unit tests is defined as testing code modules only. And with the simplicity of creating mocks and stubs, it completely eliminates any hardware dependencies or manual steps. Properly designed unit tests can be executed by a CI/CD system (e.g. Jenkins) on every code change pushed to a source code management server (e.g. GitHub, GitLab, Bitbucket). With this strategy, regression testing can be ran on every code change autonomously to catch bugs early.

References

More details on Sibros's philosophy of unit testing can be found in the article Unit Testing for C.


PEP 8 Highlights

In general, developers should follow the PEP 8 style guide. The style guide is not perfect, however, and developers may prefer to slightly deviate from the style guide.

The sections below are intended to highlight and correct common PEP 8 deviations.

Import style

Import statements should be grouped to easily identify modules that are standard, third-party, and local. Import statements should also be alphabetical ordered. These details are further elaborated in PEP 8 import.

Layout

# Standard modules # Third-party modules # Local modules

Example

If there are no third-party modules to import, then leave it blank.

Naming convention

Developers should follow the naming convention described in PEP 8 naming convention

The sections below will attempt to summarize the details described in the PEP 8 article.

Variables

Variable names should be lower_snake_case and global constants should be UPPER_SNAKE_CASE

Example

Class

Class and type names should be UpperCamelCase. If there is an acronym in the name, then it should be all UPPERCASE. If the UPPERCASE acronym compromises readability, then it is fine to make the acronym UpperCamelCase instead.

Example

Functions / methods

Function and method names should be lower_snake_case and should start with a verb to describe the action performed.

Example

Packages / modules

Package and module names should be lower_snake_case. Use underscores (_) to separate all words.

Note that our recommendation deviates from PEP 8 where PEP 8 discourages the use of underscores. We feel that it's better to be consistent and use underscores unconditionally.

Example


General Coding Standards

The sections above attempts to highlight common mistakes that deviate from the PEP 8 style guide. This section will describe guidelines derived from previous experiences of continuous improvements and many iterations of trial and error.

Variable naming and duck typing pitfalls

A common mistake in Python is naming variables ineffectively. The biggest disadvantage of Duck Typing is that it is easy to lose context if data types are not clear.

The problem

Imagine that you are a new developer working on an existing project, and you are tasked to implement the new function display_current_position given the argument coordinate. The problem is that it is not clear on how to use the given argument.

You may have many questions, and your best approach to answering those questions, unfortunately, maybe through reverse engineering. Some questions you may have are:

  • What is the data type of coordinate? Is it an integer, list, string, or an object?

  • If it is an object, what type of class is it? What public methods and attributes are accessible?

And to your surprise, it turns out the original developer decided to represent coordinate as a tuple of floats (<float>, <float>). At this point, you may assume the elements represents (latitude, longitude). But of course, your assumption may be wrong.

A solution

One solution to avoiding the duck typing pitfalls is to follow these naming conventions:

  • Names should closely resemble the data type (use suffixes if it helps)

  • Names should be plural to denote an iterable (preferably a list)

  • Use Type Hinting (Python 3.5+ only) to document parameter and return value types

Example

Sometimes it is fine to assume the name is implicitly clear if there is enough context surrounding the names (this is mostly applicable for well-known conventions or is typically the case when the readers have domain knowledge).

Filesystem

Python developers will often deal with the filesystem in some form; some applications that rely on the filesystem include general automation, file/directory manipulation, and command line tool.

Fundamental concepts

It is important to understand the fundamentals of filesystem such as:

Here are some guidelines:

  • Programs should assume the current directory can be anywhere (i.e. do not assume current directory will be in the same directory as the entry point file)

  • Programs should generally work regardless if given absolute paths or a relative paths (it depends on how the paths are provided)

  • Programs should be OS agnostic and work regardless if dealing with POSIX paths or Windows paths

Current Directory

From a program’s perspective, the current directory can be anywhere in the filesystem; it is up to the caller of the program.

It is good practice to ensure your program is agnostic to the location of the current directory. Some good ways to make your program agnostic of the current directory is to:

  1. Derive the parent directory of the program file, then assemble paths relative to the location of the program file itself

  2. Let the caller determine the location of a directory/file by taking path(s) as commend line option(s). This avoids having your program make assumptions of the host’s filesystem layout

Changing the Current Directory

Your program (whether it’s Python or a shell script) should avoid changing the caller’s current directory if possible. In Python, changing the current directory could potentially affect code further in the code stack.

If the current directory needs to be changed for any reason, ensure it gets reverted by the end of the code block that needs the current directory to be at the specific location.

Naming convention

Similar to the general naming convention in the earlier section, filesystem paths should have its own naming convention. The name of the variable should represent a filesystem node file vs. directory and should be explicit about the representation name vs. path vs. name with no extension.

Here are some naming guidelines:

  • Variable representing a directory path should be suffixed with dirpath or dirname

  • Variable representing a file path should be suffixed with filepath or filename

  • Variable representing any path should be suffixed with nodepath or nodename

Example

Other

Apart from precondition checks at the beginning of a function/method, each function/method should only have one return statement and that should be at the end of the function. This is to prevent unintentional side-effects where a developer adds some code thinking it will be run, but the function exits out before it gets to that piece of code.


File layout

Defining a convention for file layouts is important for consistency. All Python files should have a defined purpose, and Python files should not be created simply for the sake of dividing a program into different modules.

For example, if a Python file contains a wide assortment of unrelated functions (excluding a generalized utility module), then that should be an indication that the file is responsible for too many different things.

Entry point

The purpose of an entry point file is to act as the starting point of a program; in other words, this is the file that the user invokes to run a program. By convention, the entry point file should contain the function main and should have a consistent layout.

First, the command line interface (if any) should be the first thing the reader sees. The command line arguments should be defined in a get_args() function at the top of the file.

Second, the main() function should follow the command line interface definition and contain the application logic such as:

  1. Get command line arguments

  2. Do some application logic (invoke helper functions if any)

  3. Return exit code

Lastly, define helper functions below main(). The reason for having helper functions at the bottom is because readers should not care about the implementation details; the main() function should sufficiently describe what the program does.

Example

Command line interface

Python programs should be configurable through a command line interface. Command line arguments should follow the POSIX utility convention.

  • Single dash for short option (single character): -o, -i, -I

  • Double dash for long options (dashes to separate words): --output, --ignore-all-warnings

In addition to having a command line interface, Python programs should always return an Exit Code at the end of execution where zero (0) represents success and nonzero represents an error.

Module layout

Python modules should be categorized into multiple types:

  • Class definition and constants

  • Functions and constants

The motivation for categorizing modules into different types is to ensure that each module has a specific responsibility. Constants may freely reside in any module type. It's technically possible to mix class definitions and functions all into a single module, but it's better to categorize the types of modules to avoid creating a monolithic module.

Class definition and constants

A module containing a class should ideally contain exactly one class definition, and the file name should reflect that class name. This makes it easier for readers to find the class definition.

It's also fine to have multiple private classes in a class module, but remember that readers should not care about the implementation details, so try to hide private classes at the bottom of the file.

Example

Example usage

Functions and constants

Modules containing functions and constants should ideally be associated to some responsibility. Sometimes it is fine to create a generic helper.py module that contains generic helper functions, but try to avoid that.


Object-Oriented Programming (OOP) Coding Standards

This section will discuss conventions related to object oriented programming which will go over topic related to class design and layout.

Data model fundamentals

In Python, almost everything is an object. Built-in types like string, integer, list, etc. are all objects. Each of these objects have their own set of methods. For example, two lists can be added together using the + operator list + [0, 1, 2]. You cannot, however, add a list and an integer. list + 0 will result in a type error.

Developers should be aware of Python Data Models which can be used to model custom classes based on built-in types by overriding special methods (a.k.a double underscore methods or dunder methods).

Example

Class responsibilities

Ideally, a class should have a focused responsibility. A class can be an entity that has some functional responsibility, a simple data container, or both. With a focused purpose, it's methods and attributes should also be relevant to a class's responsibility. For example, let's say we have an HTTPClient class which, implied by the name, functions as an HTTP client. The HTTPClient should not have methods to parse and process application-specific response payloads from an HTTP server. Rather, the HTTPClient should pass response payloads to the caller where the caller should have context on how to use the response payload.

Example

Public / private / accessors / mutators design

A class design should be simple to read and use. Similar to a restaurant menu, you generally want to avoid overloading customers with unnecessary internal details like who prepares the food or where does the supply come from. And even if the menu exposed internal details, it's not like customers can use that information in any meaningful way. Rather, you want to make the menu straightforward; the customer wants to purchase food and drinks, and the menu should clearly highlight possible options to the customer.

Class design is very similar to the restaurant menu analogy above. A class should make it explicitly clear on what are the methods, accessors, and mutators that are at the user's disposal. A general rule is to put the important stuff at the top of the file and internal details at the bottom of the file. Readers tend to read from top to bottom.

To make classes aesthetically simple, classes should follow this order:

  1. Special methods

  2. Public methods

  3. Private methods

  4. Accessors

  5. Mutators

Notice that private methods should ideally be thrown at the bottom since it's internal details. Instead, it's placed underneath public methods for consistency; methods should be packed together to avoid mixing accessors and mutators in between methods.

Example

Note that a leading single underscore (_) can be prefixed to identifiers to denote private. Users should not use double underscores (__) since that mechanism is reserved for Name Mangling.

Data flow best practices

A common problem in other languages like C is that global variables can easily be abused. Imagine that you are a C developer and are given a C library that relies heavily on internal global variables.

When the API relies too much on global variables and internal states, the application code will begin to look very confusing to other readers.

The problem in the C code above also exists in class design as well. In class design, attributes act similarly to global variables, that is, all methods can freely access and modify attributes, and therefore, this makes methods have internal states in some form.

A solution to the problem of obscure internal state changes is to ensure that the API has proper data flow. Functions and methods should have inputs and outputs defined, and APIs should avoid retaining unnecessary states. A general rule is that if an API has multiple functions/methods that take zero inputs and zero outputs, then that is a good sign that the data flow will be very obscure to the user.

Example


Package / Module Structuring

Like classes, package and module design follow a similar paradigm.

Namespace

First off for package and module naming, try to avoid redundant names.

Example

  • /can

    • /__init__.py

    • /can_pcan.py

    • /can_value_can.py

    • /can_vector.py

In the above example, the package name can already makes it explicitly clear that modules in the package are related to CAN, so the module name does not need to be prefixed with that name; the same applies to identifiers within the modules.

Public / private imports

Packages and modules should also follow the same public and private conventions as classes. Users should certainly be discouraged from importing private modules and identifiers. Like classes, the same leading single underscore (_) can be used to denote private.

Example

To control the public / private scope of packages, use the __init__.py file that resides in the root of the package directory. When a user imports a package, the __init__.py file actually gets executed on import; developers can take advantage of this behavior by placing import statements in the __init__.py file.

Example


OOP Abstract Base Class (ABC) Coding Standards

One of the challenges of creating large program is that developers will likely encounter the scenario where many variations of something needs to be supported. For example, CAN nodes fundamentally works the same. Functionally, a CAN node can initialize, uninitialize, send message, and receive message. However, the implementation of CAN varies widely across different hardware. A PCAN USB-CAN device works differently than Vector USB-CAN device despite having the same exact behavior from a high level perspective.

To support both devices in code, you certainly want to avoid creating multiple implementations that have different APIs each with slight deviations.

Example

Abstract Base Classes (ABC) are designed to solve the problem of variations. ABC allows developers to define a common interface which acts as a contract that classes must conform to. The USB standard is a perfect example. Think about how there are many USB devices out there: keyboard, mouse, flash drive, and may other types. There isn't a specific connector for keyboard vs. flash drive; you can just plug any USB connector into a USB port and expect it to work. That seamless abstraction is all thanks to the USB standard that all USB devices must conform to. For instance, All USB devices shall support 5 V input voltage, shall communicate using the USB communication protocol, and shall identify itself to the host on connection.

ABC allows developers to define a standard interface by defining common method signatures.

Example

In the example above, developers should suffix the class name with ABC to make it explicitly clear that the class is an abstract base class. For each method, developers should clearly document the expected behavior (in a docstring) and type hint the parameters and return value.

The ABC class should document exactly how implementation classes should behave. In other words, other developers who are responsible for writing implementation classes should not have to find the original developer to seek information about the parameters, expected behavior, and the return value of all abstract methods.


Conclusion

Python is an excellent programming language that is both easy to learn and flexible for a wide range of applications. And with it's object oriented nature, it allows developers to write code ranging from very basic scripts to large, complex yet elegant programs. Sibros's coding standards were derived from PEP 8 and extended based on past experiences. Programming is more than just putting instructions together in sequential order, but is rather an art that should focus on the readers. More time is spent reading code than writing, so it is very important to ensure that readers can easily understand the intentions of the writers.

SIBROS TECHNOLOGIES, INC. CONFIDENTIAL