Python Function Design Check List

Introduction

It is very important to name your variable, function and class clearly because it will help not only your teammate but also yourself in the future. In this article, I will help you to learn how you should write your code for companies.

1. Use meaningful name for your function

  • Use verb: get_item_list, calc_tax, is_member etc
  • Use noun: current_date, as_dict etc
  • Use role: json_parser , file_uploader etc

Use more specific verb to improve your code.

Bad example

def get_latest_news():
      . . .  


def get_tax_including(price: int): 
      . . .


def get_sum_price(items: List[int]):
        . . .

Best Practice

from typing import List 


def fetch_latest_news():
      . . .  


def calc_tax_including(price: int): 
      . . .


def aggregate_sum_price(items: List[int]):
        . . .

2. Function name should always imply what it will return.

It is essential to have a function name that people can instinctively understand what it will return.

Let's think about how we can do it!

Bad example

def is_valid(name: str): 
        if name.endwith(".txt"):
              return name[:-4] + ".md"
        return name

Problem with this is that is_valid implies its returning value is boolean. However, it actually returns modified string or original string. This is really confusing, and will contribute to bug as you progress.

So how can we fix this?

Best Practice

def modify_if_valid_name(name: str) -> str: 
        if name.endwith(".txt"):
              return name[:-4]+ ".md"
        return name

or

def is_valid(name: str) -> bool: 
        return not name.endwith(".txt"):

These looks much better to understand what function does.

3: Summarize into a function with no side effects.

It is very important to be aware of "side effects" in programming. A side effect is a change in some state as a result of the execution of a program. If a function is not affected by external variables or state, it should always produce the same output when given the same input. This makes the function easy to test and unaffected by state.

What are the side-effects of a function and what are the caveats?

Bad Example

def is_valid(article):
    if article.title in INVALID_TITLES:
        return False

    # is_valid function is rewriting 
    # the value of article.valid

    article.valid = True

    # Saving data externally by calling .save()

    article.save()
    return True

def update_article(article, title, body):
    article.title = title
    article.body = body
    if not is_valid(article):
        return
    article.save()
    return article

In this case, just calling the is_valid function will change the .valid value of the article. Side effects from various functions can cause data to be modified in ways that developers do not expect. Unexpected data changes can be a source of bugs and difficult troubleshooting.

Best Practice

In this case, the is_valid function should not cause any side effects. The function name is_valid_title should be used to check whether the title is valid or not.

from pydantic import BaseModel

class Article:
        title: str 
        body: str 
        valid: bool 


def is_valid_title(title: str) -> bool:
    return title not in INVALID_TITLES:


def update_article(article:Article, title: str, body: str) -> Article:
    if not is_valid_title(title):
        return

    article.title = title
    article.body = body
    article.valid = True
    article.save()

    return article

This way, is_valid_title has no side effects and can be easily reused from other processes. It also makes it clear that "calling update_article has side-effects.

4: Functionalize in a meaningful cohesive manner.

Are you thoughtlessly grouping functions together? When separating functions, do not separate them by processing cohesion.

Bad Example

def main() -> None:
    with open(...) as f:
        reader = csv.reader(f)
        for row in reader:
            i = row[1]
            if i < 100:
                print(row[0])

This function is simply grouped together in the main() function as the "main processing". If this is separated by a "processing grouping", it becomes as follows.

def print_row(row) -> None:
    i = row[1]
    if i < 100:
        print(row[0])


def main() -> None:
    with open(...) as f:
        reader = csv.reader(f)
        for row in reader:
            print_row(row)

Functionalization makes us feel like we have made an improvement, but this separation is problematic.

Best Practice

Separate functions and processes by the meaning and reusability of the process. Before getting too enthusiastic about "separating into functions," it is important to think about what each process can mean. The function is a way of expressing what it means to read a CSV and compare it to 100. In this case, let's assume that we have a specification that says, "If the price is less than 100 dollars, the item is subject to matching.

import csv


def read_items(path: str)  -> tuple:
 """ CSV data of the product list is read and returned in a tuple generator
    Each product is returned as a "product name, price" tuple
 """
    with open(path) as f:
        reader = csv.reader(f)
        for row in reader:
            name = row[0]
            price = int(row[1])
            yield name, price


def is_addon_price(price: float) -> bool:
   """ Returns True if the price is an "item eligible for matching purchase".
    """
    return price < 100


def main() -> None:
    items = read_items()
    for name, price in items:
        if is_addon_price(price):
            print(name)

By grouping them together by meaning rather than process, it became clear what each process was for.

5: Do not make lists or dictionaries the default argument

Default arguments in Python are a useful feature, but there is a trap in using them. Have you ever written a program like the following example?

Bad Example

def foo(messages=[]):
    messages.append("Hello world")
    return messages

Do not write values=[] if you want the argument values to be an empty list by default.

Best Practice

An updatable (mutable) value must not be specified as the default argument. Remember that lists, dictionaries, and sets should not be default arguments. Set the default argument to None and specify an empty list or dictionary if None is specified in the function.

def foo(messages: List[str] =None):
    if messages is None:
        messages = []
    messages.append("Hello world")
    return messages

This will always return ["Hello world"] no matter how many times foo() is called.

6: Accepts int or str without a collection as an argument

What values should I expect for function arguments? Thinking about function arguments is very important because it determines the input specification of the function.

What is the problem with the following function?

Bad Example

def calc_after_tax_price(item:dict , tax_rate:float=0.1) -> float :
    return item['price'] * (1 + tax_rate)

This calc_after_tax_price expects item (a dictionary representing products) as an argument. This means that a dictionary with a 'price' key must be prepared every time a user simply wants to calculate sales tax. This reduces the reusability of the function.

Best Practice

It is best to accept function arguments that are not collections, such as numbers (int), floating-point numbers (float), or strings (str).

def calc_after_tax_price(price:float , tax_rate:float=0.1) -> float:
    return price * (1 + tax_rate)

By making the function accept numeric values instead of a dictionary, the calc_after_tax_price function can also be used to simply calculate the amount of after tax price.

7: Index numbers should not have meaning

There are very rare instances in Python where a list or tuple index number would make for a better program. What happens if you give meaning to the index number?

Bad Example


def item_available(item_id: int) -> bool:
        # db query here 
        return is_available

def validate_sales(row):
  """
    row is a tuple representing sales
    1st element: sales price
    2nd element: product ID
    3rd element: user ID
    Element 4: quantity
    Element 5: date and time of sale
    """
    # check if item is available
    if not item_exists(row[1]):
        raise ...

    # check number of items available
    if row[3] < 1:
        raise ...

For example, in a dictionary, row['item_id'], the process itself would represent the meaning. However, in this example, the row index number has meaning in the processing, making the program difficult to read. You have to remember that row[1] is the item ID when you are reading the program.

Also, if you process by index number, the process will break if a new value is entered in between. For example, if the row specification has been changed so that the second element now contains the "dealer ID," the process for specifying subsequent elements must be rewritten. In this case, row[3] must be changed to row[4].

Best Practice

Don't manage data by tuples, but by dictionaries or classes. Replacing row tuples with a class called Sale makes the validate_sales function much easier to read.

from pydantic import BaseModel
from datetime import datetime 


class Sale(BaseModel):
    sale_id: int
    item_id: int
    user_id: int
    amount: int
    sold_at: datetime


def validate_sales(sale):
    """ Raise an error if sales sale is invalid data.
    """
    if not item_exists(sale.item_id):
        raise ...

    if sale.amount < 1:
        raise ...

8: Do not abuse variable length arguments in function arguments

Variable length arguments, *args and **kwargs, are a useful feature of Python, but if they are used to thoughtlessly, they can easily introduce bugs into a program.

What are the problems? Let's think about it while looking at the program.

Bad Example

class User:
    def __init__(self, **kwargs):
        self.mail = kwargs['mail']
        self.password = kwargs.get('password')

This User can accept an email= argument that is not expected by the class, as shown below. If you program it mistakenly as email=, you will not get an error.

>>> user = User(name="adam", email="adam@example.com")

Here user.mail is set to None. There is a problem with unexpected data being created but no error is generated, which can lead to errors elsewhere in the program and problems with needed data not being saved.

Best Practice

Do not carelessly use *args and **kwargs, but specify them as individual arguments.

class User:
    def __init__(self, password: str, mail: str=None):
        self.mail = mail
        self.password = password

or

from pydantic import BaseModel

class User(BaseModel):
        mail: str 
        password: str

In this case, specifying an argument that does not exist will result in an error.

9: Write "why" in your comments

What do you write in your program comments? There are some points to keep in mind when writing comments.

Bad Example

def fetch_users_by_ids(user_ids: List[int]) -> List[UserQuery]:
    # fetch user from database filtered by user ids. 
    # user_id must be int 

   # users receives UserQuery

    # Loop through the user_ids one by one and process
    # SQL is executed on the backend when user_id loops
    for user_id in user_ids:
        ...
    return users

The problem is that this program has a lot of useless comments. If you read the program, you will understand the process. If the code is difficult to understand without comments, consider whether it could be rewritten in simple code before supplementing the explanation with comments.

Best Practice

Write "why" in comments. If you write a function specification, write it in a docstring, not in a comment.

from typing import List 


def fetch_users_by_ids(user_ids: List[int]) -> List[UserQuery]:
    """ ~~Process to fetch user

    Do <fetch_users_by_ids> for multiple users.
    ~~ for ~~, we need to change the user's data.
    """
    # To reduce the number of SQL executions, do not separate this loop into            separate functions
    for user in users:
        ...
    return user

If a program is difficult to understand just by reading the code, write a comment explaining the meaning of the process and why it is written the way it is.

The comment can be thought of as an explanation of "why it is not processed this way. It is effective to annotate "why this processing is done this way" for a process that is not natural to the reader of the program.

10: Do not write processing in the controller

Are you writing too much in your main() function or web framework controller (Django's View)?

Bad Example

Here is an example of a process written in a View function in the web framework Django.

@login_required
def product_list_view(request, store_id:int):
    shop = get_object_or_404(Store, id=store_id)

    if not request.user.memberships.filter(role=Membership.ROLE_OWNER, store=store).exists():
        return HttpResponseForbidden()

    products = product.objects.filter(store=store,
                                published_at__isnull=False)
    if "search" in request.GET:
        search_text = request.GET["search"]
        if len(search_text) < 2:
            return TemplateResnponse(request, "products/products_list.html",
                                     {"products": products, "error": "text is too short"})
        products = items.filter(name__contains=search_text)

    prices = []

    for product in products:
        price = int(product.price * 1.1)
        prices.append(f"$: {price}")

    products = zip(products, prices)
    return TemplateResponse(request, "products/products_list.html", {"products": products})

This program writes too much processing to the item_list_view function.

Best Practice

The controller should only input and output values and control the entire process. If you implement detailed processing in the controller, too many programs will be written in the controller. Not only will this make the overall process less clear, it will also make unit testing more difficult. Most of the processing in the above product_list_view View function should be implemented in separate functions or components.

The product_list_view function after separating each process is as follows.

``` @login_required def product_list_view(request, store_id:int): store = get_object_or_404(Store, id=store_id) validate_membership_permission(request.user, shop, Membership.ROLE_OWNER)

 products = product.objects.filter(store=store).published()
 form = ProductSearchForm(request.GET)
 if form.is_valid():
     products = form.filter_products(products)
 return TemplateResponse(request, "products/products_list.html",
                         {"products": products, "form": form})