Python Coding Standards
- 1 Introduction
- 2 Why Python?
- 3 General Philosophy
- 4 PEP 8 Highlights
- 4.1 Import style
- 4.2 Naming convention
- 4.2.1 Variables
- 4.2.2 Class
- 4.2.3 Functions / methods
- 4.2.4 Packages / modules
- 5 General Coding Standards
- 5.1 Variable naming and duck typing pitfalls
- 5.2 Filesystem
- 5.2.1 Fundamental concepts
- 5.2.2 Current Directory
- 5.2.2.1 Changing the Current Directory
- 5.3 Naming convention
- 5.4 Other
- 6 File layout
- 6.1 Entry point
- 6.1.1 Command line interface
- 6.2 Module layout
- 6.1 Entry point
- 7 Object-Oriented Programming (OOP) Coding Standards
- 8 Package / Module Structuring
- 9 OOP Abstract Base Class (ABC) Coding Standards
- 10 Conclusion
Introduction
The Sibros Python coding standards serves as a general programming guideline for a variety of aspects such as style guide, conventions, and design. The guidelines extend beyond Python's PEP 8 style guide, and just like PEP 8, the guidelines only serve as recommendations.
Why Python?
Python is an excellent programming language that has multiple advantages over other languages.
Easy to learn
Python was designed to be written closer to English by simplifying the overall syntax; it generally takes less lines of code to achieve similar functionality in other languages like C or C++.
Great for prototyping
Because of Python's simplistic nature, it has a quick turnaround time to create programs for a wide range of applications ranging from internal/external standalone tools to simple scripts for automation.
Object-oriented
Python is an object-oriented language, so developers can take advantage of object-oriented design to create complex yet elegant designs.
General Philosophy
Readability vs. performance
Python should be written with readability as the utmost priority; optimization is secondary. Optimization should only be done if the problem being solved is proven to be a problem in the first place. See Premature Optimization for more details.
For example, let's say algorithm A has an execution time of 50 ms, and algorithm B uses some obscure logic to improve the execution time to 1 ms. Assuming algorithm B's obscure logic compromises readability, then algorithm B is not justifiable because a 50x performance gain in the order of tens of milliseconds is negligible relative to human perception.
The same argument can be used for memory optimization.
Nested indentations
Try to ensure blocks of code have a maximum of three (3) levels of indentation if possible (indentations from functions and classes do not count). If an algorithm requires an excessive amount of indentations, then consider breaking up the algorithm into sequential chunks. For example, instead of having an algorithm that does everything in a single for loop, consider dividing the algorithm into sequential, smaller steps: step 1: do this, step 2: do that, finally step 3: combine results.
Modularity
Modularity is a necessary design principle to writing complex yet maintainable programs. For example, if a file contains 500+ lines of code, then that's a good sign the file needs to be divided into modules. The same can be said for functions/methods, control flow (e.g. if/elif/else, for loops, etc.), and classes.
An extreme example of unmaintainable code is when a function has many nested if/elif/else blocks of code where each condition contains a significant amount of code (e.g. 100+ lines). At this point, the code can be compared to abstract art; readers will find it to be very difficult to identify the logical boundaries among the many lines of code. This scenario typically happens when modularity was not incorporated into the design and the program has to handle frequent requirement changes or support multiple variations of something.
Unit testing
Try to unit test all code. Any untested code might have bugs regardless of how trivial the logic is. The last thing you want is to have customers discover bugs in your program. Overall, unit testing is an essential discipline that has multiple advantages.
Unit testing encourages modular design
Unit testing encourages modular design by encouraging developers to divide large programs into small building blocks which ultimately makes the overall program more testable. It's more preferable to test small, focused components (functions, classes, modules, etc.) over a large monolithic component full of complexities.
Unit testing encourages developers to design testable APIs
In order to test an API, the API has to be testable. In other words, an API can be wrongfully designed such that it is not easily testable.
Example
Imagine that you are writing a unit test for a progress bar class that is responsible for displaying progress percentages to a GUI widget.
class ProgressBar:
def display_progress(self):
"""
Bad: This method is not easily testable
"""
# Displays progress percentage to a GUI widget
# Progress depends on an external source
progress_percent = self._external_source.progress
self._view.set_value(progress_percent)
The problem is that the class is not easily testable. How do you validate the behavior of the class ProgressBar
for percentages ranging [0, 100]
? How do you simulate the progress percent? What if there is a corner case bug that happens when the percentage is exactly 100
percent?
To make the class more testable, the user should be able to easily provide inputs (to simulate cases) and read the outputs (for validating the behavior for each case). Mocks and stubs can also help test behaviors that cannot be easily simulated.
class ProgressBar:
def display_progress(self, progress_percent: int) -> bool:
"""
Good: This method is easier to test
:param progress_percent: Progress percent to display; valid values: [0, 100]
:return: True if successful
"""
# Displays given progress percentage to a GUI widget
success = self._view.set_value(progress_percent)
return success
from dataclasses import dataclass
def test_display_progress_succeeds_within_0_to_100():
progress_bar = ProgressBar()
@dataclass
class TestData:
progress_percent: int
expected_result: bool
test_data = [
TestData(-1, False),
TestData(0, True),
TestData(100, True),
TestData(101, False),
]
for data in test_data:
assert data.expected_result == progress_bar.display_progress(data.progress_percent)
Unit testing is automation friendly
By definition, unit tests is defined as testing code modules only. And with the simplicity of creating mocks and stubs, it completely eliminates any hardware dependencies or manual steps. Properly designed unit tests can be executed by a CI/CD system (e.g. Jenkins) on every code change pushed to a source code management server (e.g. GitHub, GitLab, Bitbucket). With this strategy, regression testing can be ran on every code change autonomously to catch bugs early.
References
More details on Sibros's philosophy of unit testing can be found in the article Unit Testing for C.
PEP 8 Highlights
In general, developers should follow the PEP 8 style guide. The style guide is not perfect, however, and developers may prefer to slightly deviate from the style guide.
The sections below are intended to highlight and correct common PEP 8 deviations.
Import style
Import statements should be grouped to easily identify modules that are standard, third-party, and local. Import statements should also be alphabetical ordered. These details are further elaborated in PEP 8 import.
Layout
# Standard modules
# Third-party modules
# Local modules
Example
from argparse import ArgumentParser
import os
import re
from subprocess import Popen
import sys
import requests
import yaml
from httptransport import HTTPTransport
If there are no third-party modules to import, then leave it blank.
import os
from httptransport import HTTPTransport
Naming convention
Developers should follow the naming convention described in PEP 8 naming convention
The sections below will attempt to summarize the details described in the PEP 8 article.
Variables
Variable names should be lower_snake_case
and global constants should be UPPER_SNAKE_CASE
Example
MY_GLOBAL_CONSTANT = 10
for index in range(MY_GLOBAL_CONSTANT)
message = "Index: {}".format(index)
print(message)
Class
Class and type names should be UpperCamelCase
. If there is an acronym in the name, then it should be all UPPERCASE
. If the UPPERCASE
acronym compromises readability, then it is fine to make the acronym UpperCamelCase
instead.
Example
class MyCustomClass:
pass
class HTTPClient:
pass
class UDSServer:
pass
class UdsServer: # <== Maybe more readable to others, but prefer `UDS` over `Uds` if possible
pass
Functions / methods
Function and method names should be lower_snake_case
and should start with a verb to describe the action performed.
Example
class HTTPClient:
def upload(self, url: str, filepath: str):
raise NotImplementedError
def download(self, url: str, filepath: str):
raise NotImplementedError
def _resume_download(self, url: str, offset: int) -> int:
raise NotImplementedError
def get_current_timestamp(): -> float:
raise NotImplementedError
def compute_next_offset(current_offset: int) -> int:
raise NotImplementedError
Packages / modules
Package and module names should be lower_snake_case
. Use underscores (_
) to separate all words.
Note that our recommendation deviates from PEP 8 where PEP 8 discourages the use of underscores. We feel that it's better to be consistent and use underscores unconditionally.
Example
http_package
constants.py
custom_json_parser.py
my_module.py
uds_server.py
import constants
from custom_json_parser import CustomJSONParser
from http_package import HTTPClient
import my_module
from uds_server import UDSServer
General Coding Standards
The sections above attempts to highlight common mistakes that deviate from the PEP 8 style guide. This section will describe guidelines derived from previous experiences of continuous improvements and many iterations of trial and error.
Variable naming and duck typing pitfalls
A common mistake in Python is naming variables ineffectively. The biggest disadvantage of Duck Typing is that it is easy to lose context if data types are not clear.
The problem
Imagine that you are a new developer working on an existing project, and you are tasked to implement the new function display_current_position
given the argument coordinate
. The problem is that it is not clear on how to use the given argument.
def display_current_position(coordinate):
raise NotImplementedError # <== Implement this function given a coordinate
You may have many questions, and your best approach to answering those questions, unfortunately, maybe through reverse engineering. Some questions you may have are:
What is the data type of
coordinate
? Is it an integer, list, string, or an object?If it is an object, what type of class is it? What public methods and attributes are accessible?
And to your surprise, it turns out the original developer decided to represent coordinate
as a tuple of floats (<float>, <float>)
. At this point, you may assume the elements represents (latitude, longitude)
. But of course, your assumption may be wrong.
A solution
One solution to avoiding the duck typing pitfalls is to follow these naming conventions:
Names should closely resemble the data type (use suffixes if it helps)
Names should be plural to denote an iterable (preferably a list)
Use Type Hinting (Python 3.5+ only) to document parameter and return value types
Example
from coordinate import Coordinate
# Good: Name is derived from the type Coordinate
coordinate = Coordinate()
# Good: Name is suffixed with the type and overall has more context (represents coordinate of a vehicle)
vehicle_coordinate = Coordinate()
# Bad: Name deviates from the type Coordinate; this will confuse other readers
location = Coordinate()
# Good: Name denotes that the variable is a pair (tuple) and is explicit about the data structure
latitude_longitude_pair = (coordinate.latitude, coordinate.longitude)
# Debatable: Sometimes names may be too long or too detailed. It may already be implied that latitude and longitude are best represented as floats
latitude_longitude_float_pair = (coordinate.latitude, coordinate.longitude)
coordinate = Coordinate()
# Good: Plural name clearly implies a list of coordinate objects
coordinates = [
coordinate,
]
# Somewhat good: Although the suffix "dict" makes it clear that this is a dictionary, readers also need context on the dictionary's schema
coordinate_dict = {
"current": coordinate
}
# Good: Suffix makes it clear that this is a set (unique set of coordinates)
coordinate_set = {
coordinate
}
# Good: Suffix makes it clear that this is a string representing a coordinate
coordinate_str = "0.0, 0.0"
Sometimes it is fine to assume the name is implicitly clear if there is enough context surrounding the names (this is mostly applicable for well-known conventions or is typically the case when the readers have domain knowledge).
from memoryblock import MemoryBlock
# Memory block represents a simple block of memory starting at some address of some size
memory_block = MemoryBlock()
memory_block.name # Assume this attribute is a string representing the name of the memory block
memory_block.address # Assume this attribute is an integer representing the start address
memory_block.size # Assume this attribute is an integer representing the size in bytes
Filesystem
Python developers will often deal with the filesystem in some form; some applications that rely on the filesystem include general automation, file/directory manipulation, and command line tool.
Fundamental concepts
It is important to understand the fundamentals of filesystem such as:
Here are some guidelines:
Programs should assume the current directory can be anywhere (i.e. do not assume current directory will be in the same directory as the entry point file)
Programs should generally work regardless if given absolute paths or a relative paths (it depends on how the paths are provided)
Programs should be OS agnostic and work regardless if dealing with POSIX paths or Windows paths
Current Directory
From a program’s perspective, the current directory can be anywhere in the filesystem; it is up to the caller of the program.
./program.py # Current directory is in the same directory (parent directory)
./directory/program.py # Current directory is 1 directory up from the parent directory
/home/root/directory/program.py # Current directory could be anywhere in the filesystem
It is good practice to ensure your program is agnostic to the location of the current directory. Some good ways to make your program agnostic of the current directory is to:
Derive the parent directory of the program file, then assemble paths relative to the location of the program file itself
Let the caller determine the location of a directory/file by taking path(s) as commend line option(s). This avoids having your program make assumptions of the host’s filesystem layout
"""
Example, derive the parent directory of this file, then assemble paths relative to the parent directory
This approach is great if this program is packaged with other files that live alongside it (e.g. configuration files)
"""
import os
from pathlib import Path
# Ways to get the parent directory path of this program file
self_dirpath = os.path.dirname(__file__) # Using traditional os to get a string object
self_dirpath = Path(__file__).parent # Using modern pathlib to get a Path object
# Ways to assemble a path relative to the parent directory of this file
target_filepath = os.path.join(self_dirpath, "directory", "test.txt") # Assume self_dirpath is a string object
target_filepath = self_dirpath.joinpath("directory", "test.txt") # Assume self_dirpath is a Path object
"""
Example, let the caller of this program provide the path to use
This example uses pathlib Path objects instead of os string objects
"""
from argparse import ArgumentParser
from pathlib import Path
import sys
def get_args():
argument_parser = ArgumentParser()
argument_parser.add_argument("-d", "--target-dirpath", type=Path, help="A directory path to reference")
return argument_parser.parse_args()
def main():
args = get_args()
target_filepath = args.target_dirpath.joinpath("directory", "test.txt")
if __name__ == "__main__":
sys.exit(main())
Changing the Current Directory
Your program (whether it’s Python or a shell script) should avoid changing the caller’s current directory if possible. In Python, changing the current directory could potentially affect code further in the code stack.
If the current directory needs to be changed for any reason, ensure it gets reverted by the end of the code block that needs the current directory to be at the specific location.
import os
prev_current_directory = os.getcwd()
try:
os.chdir("/tmp")
# Do stuff here
finally:
os.chdir(prev_current_directory)
Naming convention
Similar to the general naming convention in the earlier section, filesystem paths should have its own naming convention. The name of the variable should represent a filesystem node file vs. directory
and should be explicit about the representation name vs. path vs. name with no extension
.
Here are some naming guidelines:
Variable representing a directory path should be suffixed with
dirpath
ordirname
Variable representing a file path should be suffixed with
filepath
orfilename
Variable representing any path should be suffixed with
nodepath
ornodename
Example
from datetime import datetime
import os
from pathlib import Path
# Good: It is clear that this variable represents a directory containing N log files
log_dirpath = Path("workspace/logs")
# Bad: "logs" can easily be misinterpreted as a list of log objects
logs = Path("workspace/logs")
timestamp_str = datetime.now().strftime("%m-%d-%Y_%H-%M-%S")
# Good: It is clear that this variable represents a file path relative to the current directory
log_filepath = Path("log_{}.txt".format(timestamp_str))
# Good: It is clear that this variable represents a file name and does not represent an addressable path by itself
log_filename = "log_{}.txt".format(timestamp_str)
# Good: The suffix nodepath implicitly denotes that this function takes file or directory paths
def copy(src_nodepath, dest_nodepath):
"""
Copy a source path to the destination path
If the given paths are directory paths, perform a recursive copy
"""
raise NotImplementedError
# Bad: The name "file" can easily be misinterpreted as a file object.
# The function "os.listdir" actually returns a list of node names (names of both directories and files)
for file in os.listdir(log_dirpath):
print(file)
Other
Apart from precondition checks at the beginning of a function/method, each function/method should only have one return statement and that should be at the end of the function. This is to prevent unintentional side-effects where a developer adds some code thinking it will be run, but the function exits out before it gets to that piece of code.
File layout
Defining a convention for file layouts is important for consistency. All Python files should have a defined purpose, and Python files should not be created simply for the sake of dividing a program into different modules.
For example, if a Python file contains a wide assortment of unrelated functions (excluding a generalized utility module), then that should be an indication that the file is responsible for too many different things.
Entry point
The purpose of an entry point file is to act as the starting point of a program; in other words, this is the file that the user invokes to run a program. By convention, the entry point file should contain the function main
and should have a consistent layout.
First, the command line interface (if any) should be the first thing the reader sees. The command line arguments should be defined in a get_args()
function at the top of the file.
Second, the main()
function should follow the command line interface definition and contain the application logic such as:
Get command line arguments
Do some application logic (invoke helper functions if any)
Return exit code
Lastly, define helper functions below main()
. The reason for having helper functions at the bottom is because readers should not care about the implementation details; the main()
function should sufficiently describe what the program does.
Example
from argparse import ArgumentParser
import sys
def get_args():
argument_parser = ArgumentParser()
return argument_parser.parse_args()
def main():
args = get_args()
return 0
def helper_function():
raise NotImplementedError
if __name__ == "__main__":
sys.exit(main()) # <== Return exit code using main() return value as the exit code
Command line interface
Python programs should be configurable through a command line interface. Command line arguments should follow the POSIX utility convention.
Single dash for short option (single character):
-o
,-i
,-I
Double dash for long options (dashes to separate words):
--output
,--ignore-all-warnings
In addition to having a command line interface, Python programs should always return an Exit Code at the end of execution where zero (0) represents success and nonzero represents an error.
Module layout
Python modules should be categorized into multiple types:
Class definition and constants
Functions and constants
The motivation for categorizing modules into different types is to ensure that each module has a specific responsibility. Constants may freely reside in any module type. It's technically possible to mix class definitions and functions all into a single module, but it's better to categorize the types of modules to avoid creating a monolithic module.
Class definition and constants
A module containing a class should ideally contain exactly one class definition, and the file name should reflect that class name. This makes it easier for readers to find the class definition.
It's also fine to have multiple private classes in a class module, but remember that readers should not care about the implementation details, so try to hide private classes at the bottom of the file.
Example
# Contents of http_client.py
class HTTPClient:
raise NotImplementedError
# Alternative contents of http_client.py
HTTP_POST_METHOD = "post"
class HTTPClient:
raise NotImplementedError
class _HTTPTransport:
raise NotImplementedError
Example usage
from http_client import HTTPClient
http_client = HTTPClient()
Functions and constants
Modules containing functions and constants should ideally be associated to some responsibility. Sometimes it is fine to create a generic helper.py
module that contains generic helper functions, but try to avoid that.
# Contents of http_helper.py
from enum import Enum
class HTTPStatusCode(Enum):
SUCCESS = (200, 299)
CLIENT_ERROR = (400, 499)
SERVER_ERROR = (500, 599)
def is_success(status_code):
raise NotImplementedError
def is_failure(status_code):
raise NotImplementedError
Object-Oriented Programming (OOP) Coding Standards
This section will discuss conventions related to object oriented programming which will go over topic related to class design and layout.
Data model fundamentals
In Python, almost everything is an object. Built-in types like string, integer, list, etc. are all objects. Each of these objects have their own set of methods. For example, two lists can be added together using the +
operator list + [0, 1, 2]
. You cannot, however, add a list and an integer. list + 0
will result in a type error.
my_str = "I'm a string object"
my_int = 0 # <== integer object
my_list = [] # <== list object
my_list + my_int # <== TypeError exception
Developers should be aware of Python Data Models which can be used to model custom classes based on built-in types by overriding special methods (a.k.a double underscore methods or dunder methods).
Example
from typing import *
class Table:
def __init__(self, columns: List[str], rows: List[List[str]]):
self._columns = columns
self._rows = rows
def __len__(self):
return len(self._rows)
def __str__(self):
lines = [
str(self._columns),
]
lines += [str(row) for row in self._rows]
return ("\n".join(lines))
def __iter__(self):
for row in self._rows:
yield row
def __getitem__(self, key):
return self._rows[key]
table = Table(
columns=["c0", "c1"],
rows=[
["r00", "r01"],
["r10", "r11"],
]
)
len(table) # Calls __len__ to get length of table
print(table) # Calls __str__ to get a string representation
for row in table: # Calls __iter__ to get a generator
pass
table[0] # Calls __getitem__ to retrieve first (0) element
Class responsibilities
Ideally, a class should have a focused responsibility. A class can be an entity that has some functional responsibility, a simple data container, or both. With a focused purpose, it's methods and attributes should also be relevant to a class's responsibility. For example, let's say we have an HTTPClient
class which, implied by the name, functions as an HTTP client. The HTTPClient
should not have methods to parse and process application-specific response payloads from an HTTP server. Rather, the HTTPClient
should pass response payloads to the caller where the caller should have context on how to use the response payload.
Example
from typing import *
from response import Response
class HTTPClient:
def get(self, url: str, headers: dict = None) -> Response:
raise NotImplementedError
def get_device_ids_from_response(self, url: str) -> List[int]:
"""
Bad: What is a "device ID"? HTTP client should not have any application-specific logic
As the class name implies, this class should only be responsible for functioning as an HTTP client and
therefore should only have methods and attributes relevant to an HTTP client
"""
raise NotImplementedError
"""
Good:
- HTTPClient is only responsible for performing HTTP requests. It does not care about the response payloads
- DeviceInfoReader gives meaning to response payloads; It uses HTTPClient to handle HTTP transactions
with the cloud
"""
from typing import *
from cloud_endpoint import DEVICE_INFO_URL
from response import Response
class HTTPClient:
def get(url: str, headers: dict = None) -> Response:
raise NotImplementedError
class DeviceInfoReader:
"""
Responsible for getting device information from the cloud
"""
def __init__(self, http_client: HTTPClient):
self._http_client = http_client
def get_device_ids(self) -> List[int]:
response = self._http_client.get(url=DEVICE_INFO_URL)
# Assume response body is a dictionary where "device_ids" is a list of int
return response.body["device_ids"]
Public / private / accessors / mutators design
A class design should be simple to read and use. Similar to a restaurant menu, you generally want to avoid overloading customers with unnecessary internal details like who prepares the food or where does the supply come from. And even if the menu exposed internal details, it's not like customers can use that information in any meaningful way. Rather, you want to make the menu straightforward; the customer wants to purchase food and drinks, and the menu should clearly highlight possible options to the customer.
Class design is very similar to the restaurant menu analogy above. A class should make it explicitly clear on what are the methods, accessors, and mutators that are at the user's disposal. A general rule is to put the important stuff at the top of the file and internal details at the bottom of the file. Readers tend to read from top to bottom.
To make classes aesthetically simple, classes should follow this order:
Special methods
Public methods
Private methods
Accessors
Mutators
Notice that private methods should ideally be thrown at the bottom since it's internal details. Instead, it's placed underneath public methods for consistency; methods should be packed together to avoid mixing accessors and mutators in between methods.
Example
"""
Note: The headers (e.g. # Public methods) are optional
"""
from response import Response
class HTTPClient:
def __init__(self):
self._authentication = {}
def __str__(self):
return "HTTP client"
#
# Public methods
#
def get(url: str, header: dict = None) -> Response:
raise NotImplementedError
def post(url: str, header: dict = None) -> Response:
raise NotImplementedError
#
# Private methods
#
def _format_header(header: dict) -> str
raise NotImplementedError
#
# Accessors
#
@property
def authentication(self): -> dict
return self._authentication
#
# Mutators
#
@authentication.setter
def authentication(self, new_authentication: dict):
self._authentication = new_authentication
http_client = HTTPClient()
# Good: User calls public method
response = http_client.get(url="url")
# Bad: User calls private method; not intended by author of HTTPClient
http_client._format_header(header={})
# Good: User uses accessor and mutator
http_client.authentication = {}
print(http_client.authentication)
# Bad: User accesses a private attribute; not intended by author of HTTPClient
http_client._authentication = {}
Note that a leading single underscore (_
) can be prefixed to identifiers to denote private. Users should not use double underscores (__
) since that mechanism is reserved for Name Mangling.
Data flow best practices
A common problem in other languages like C is that global variables can easily be abused. Imagine that you are a C developer and are given a C library that relies heavily on internal global variables.
/*
* Contents of http_client.c
*
* The problem is that in C, it may be tempting to rely on global variables
*/
#include <stdint.h>
#include "http_client.h"
static module_s module;
static response_s last_response;
static uint32_t last_status_code;
void http_client__initialize(void)
{
// Not implemented
}
void http_client__get(const char* url)
{
// Not implemented
}
void http_client__get_response(response_s* response)
{
if (NULL != response) {
*response = last_response;
}
}
uint32_t http_client__get_status_code(void)
{
return last_status_code;
}
When the API relies too much on global variables and internal states, the application code will begin to look very confusing to other readers.
#include "http_client.h"
int main(int argc, char** argv)
{
// Bad: HTTP client is not configurable
http_client__initialize();
// Bad: No return type; user passes URL in, but nothing comes out
http_client__get("url");
// Bad: Library now has a response based on a previous function call
// There is hardly an association between "http_client__get_response" and "http_client__get"
response_s response = {};
http_client__get_response(&response);
return -1;
}
The problem in the C code above also exists in class design as well. In class design, attributes act similarly to global variables, that is, all methods can freely access and modify attributes, and therefore, this makes methods have internal states in some form.
"""
The problem is the same as the C library example above
Attributes act like global variables; in this case, those attributes unnecessarily retain internal state
"""
import requests
class HTTPClient:
def __init__(self):
self._last_response = None
self._last_status_code = None
def get(self, url: str):
self._last_response = requests.get(url)
self._last_status_code = self._last_response.status_code
def get_response(self): -> Response
return self._last_response
def get_last_status_code(self) -> int:
return self._last_status_code
http_client = HTTPClient()
# Bad: Methods "get()" and "get_response()" have a loose association
# Readers will get confused as to where the response came from
# In other words, what changed "get_response()" behavior?
response = http_client.get_response() # None
http_client.get(url="url")
# Application logic here
response = http_client.get_response() # Response object
A solution to the problem of obscure internal state changes is to ensure that the API has proper data flow. Functions and methods should have inputs and outputs defined, and APIs should avoid retaining unnecessary states. A general rule is that if an API has multiple functions/methods that take zero inputs and zero outputs, then that is a good sign that the data flow will be very obscure to the user.
Example
# Bad: There's a lot of magic happening internally which makes the application code confusing
stateful_class = StatefulClass(result_filepath="results.txt")
stateful_class.do_this()
stateful_class.do_that()
stateful_class.handle_results()
stateful_class.write_results_to_file()
stateful_class.handle_error()
# Good: The solution is to define inputs and outputs for functions and methods if applicable
less_stateful_class = LessStatefulClass()
output_a = less_stateful_class.do_this(input_a)
output_b = less_stateful_class.do_that(input_b)
formatted_results = less_stateful_class.handle_results(results=[output_a, output_b])
error = less_stateful_class.write_results_to_file(filepath="results.txt", results=formatted_results)
less_stateful_class.handle_error(error)
Package / Module Structuring
Like classes, package and module design follow a similar paradigm.
Namespace
First off for package and module naming, try to avoid redundant names.
Example
/can
/__init__.py
/can_pcan.py
/can_value_can.py
/can_vector.py
import can
# Bad: Redundant names in namespace result in repetitive reference to "can"
pcan = can.can_pcan.CAN_PCAN()
In the above example, the package name can
already makes it explicitly clear that modules in the package are related to CAN, so the module name does not need to be prefixed with that name; the same applies to identifiers within the modules.
import can
# Good: Package namespace already associates module "pcan" to the context of CAN
pcan = can.pcan.PCAN()
Public / private imports
Packages and modules should also follow the same public and private conventions as classes. Users should certainly be discouraged from importing private modules and identifiers. Like classes, the same leading single underscore (_
) can be used to denote private.
Example
# Contents of module.py
PUBLIC_CONSTANT = 0
ANOTHER_PUBLIC_CONSTANT = 0
_PRIVATE_CONSTANT = -1
class PublicClass:
raise NotImplementedError
class _PrivateClass:
raise NotImplementedError
To control the public / private scope of packages, use the __init__.py
file that resides in the root of the package directory. When a user imports a package, the __init__.py
file actually gets executed on import; developers can take advantage of this behavior by placing import statements in the __init__.py
file.
Example
# Contents of package/__init__.py
# This file gets executed when a user imports this package
from .module import PublicClass, PUBLIC_CONSTANT, ANOTHER_PUBLIC_CONSTANT
from package import PublicClass, PUBLIC_CONSTANT
public_class = PublicClass()
OOP Abstract Base Class (ABC) Coding Standards
One of the challenges of creating large program is that developers will likely encounter the scenario where many variations of something needs to be supported. For example, CAN nodes fundamentally works the same. Functionally, a CAN node can initialize
, uninitialize
, send message
, and receive message
. However, the implementation of CAN varies widely across different hardware. A PCAN USB-CAN device works differently than Vector USB-CAN device despite having the same exact behavior from a high level perspective.
To support both devices in code, you certainly want to avoid creating multiple implementations that have different APIs each with slight deviations.
Example
"""
Bad: Both PCAN and VectorCAN have similar functional behavior, and the APIs are similar,
but the API variation will force users to write duplicate application code each with slight variations
"""
from typing import *
from pcan_message import PCANMessage
class PCAN:
def __init__(self, kbaud: int):
raise NotImplementedError
def send_message_to_channel(self, channel: int, pcan_message: PCANMessage):
raise NotImplementedError
def receive_message_to_channel(self, channel:int) -> PCANMessage:
raise NotImplementedError
class VectorCAN:
def __init__(self, port: str, baud: int):
raise NotImplementedError
def send(self, bus: int, id: int, data: bytearray):
raise NotImplementedError
def receive(self, bus: int) -> Tuple[int, bytearray]:
raise NotImplementedError
from usbcan import PCAN, VectorCAN
def send_can_message(device, bus: int, id: int, data: bytearray):
"""
Bad: User has to write duplicate code with slight variations that practically do the same thing
User will also have to write more "elif" conditions to support more device types.
"""
if isinstance(device, PCAN):
pcan_message = PCANMessage(id=id, data=data)
device.send_message_to_channel(channel=bus, pcan_message)
elif isinstance(device, VectorCAN):
device.send(bus=bus, id=id, data=data)
else:
raise TypeError(device)
Abstract Base Classes (ABC) are designed to solve the problem of variations. ABC allows developers to define a common interface which acts as a contract that classes must conform to. The USB standard is a perfect example. Think about how there are many USB devices out there: keyboard, mouse, flash drive, and may other types. There isn't a specific connector for keyboard vs. flash drive; you can just plug any USB connector into a USB port and expect it to work. That seamless abstraction is all thanks to the USB standard that all USB devices must conform to. For instance, All USB devices shall support 5 V input voltage, shall communicate using the USB communication protocol, and shall identify itself to the host on connection.
ABC allows developers to define a standard interface by defining common method signatures.
Example
# Contents of can_interface_abc.py
from abc import ABC, abstractmethod
from typing import *
from errors import CANInterfaceError
from can_message import CANMessage
class CANInterfaceABC(ABC):
#
# Public methods
#
@abstractmethod
def initialize(self, kbaud: float) -> bool:
"""
This method shall initialize a hardware with the given kbaud
Return true if successful
"""
raise NotImplementedError
@abstractmethod
def uninitialize(self):
"""
This method shall uninitialize a hardware; do nothing if never initialized
"""
raise NotImplementedError
@abstractmethod
def send(self, can_message: CANMessage) -> bool:
"""
This method shall send a given CAN message and return true if successful
:raise: CANInterfaceError
"""
raise NotImplementedError
def send_multiple(self, can_messages: List[CANMessage]) -> bool:
"""
Abstract base classes can also have some implementation too that depend on other abstract methods
"""
ret = False
for can_message in can_messages:
ret = self.send(can_message)
if not ret:
break
@abstractmethod
def receive(self, timeout: float) -> Union[CANMessage, None]:
"""
This method shall receive a CAN message from a receive buffer (if any)
This method shall also block for up to the given timeout in seconds
Return a CAN message object if a received message exists else return None
:raise: CANInterfaceError
"""
raise NotImplementedError
#
# Private method
#
@abstractmethod
def _block_timeout(self, timeout: float):
"""
Private methods can be abstract methods too
"""
raise NotImplementedError
In the example above, developers should suffix the class name with ABC
to make it explicitly clear that the class is an abstract base class. For each method, developers should clearly document the expected behavior (in a docstring) and type hint the parameters and return value.
The ABC class should document exactly how implementation classes should behave. In other words, other developers who are responsible for writing implementation classes should not have to find the original developer to seek information about the parameters, expected behavior, and the return value of all abstract methods.
# Contents of pcan.py
import time
from typing import *
from can_interface_abc import CANInterfaceABC
class PCAN(CANInterfaceABC):
#
# Public methods
#
def initialize(self, kbaud: float) -> bool:
# Implementation here
return True
def uninitialize(self):
# Implementation here
pass
def send(self, can_message: CANMessage) -> bool:
# Implementation here
return True
def receive(self, timeout: float) -> Union[CANMessage, None]:
# Implementation here
return None
#
# Private methods
#
def _block_timeout(self, timeout: float):
time.sleep(timeout)
"""
Assume PCAN and VectorCAN both conform to CANInterfaceABC
"""
from can import CANInterfaceABC, CANMessage, PCAN, VectorCAN
can_device = get_can_device() # Assume this function exists and returns a CANInterface object
assert isinstance(can_device, CANInterfaceABC) # This statement is just an example; returns True
# Good: Application code can be reused across different CAN device types
can_device.initialize(kbaud=500)
can_message = CANMessage()
can_device.send(can_message)
can_message = can_device.receive(timeout=1.0)
can_device.uninitialize()
Conclusion
Python is an excellent programming language that is both easy to learn and flexible for a wide range of applications. And with it's object oriented nature, it allows developers to write code ranging from very basic scripts to large, complex yet elegant programs. Sibros's coding standards were derived from PEP 8 and extended based on past experiences. Programming is more than just putting instructions together in sequential order, but is rather an art that should focus on the readers. More time is spent reading code than writing, so it is very important to ensure that readers can easily understand the intentions of the writers.
Related content
SIBROS TECHNOLOGIES, INC. CONFIDENTIAL