Python dataclass

[CG]Maxime
11.5K views

Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Create Content

Introduction

Among the new features of Python 3.7, a new one is the decorator @dataclass that simplify the creation of data classes by auto-generating special methods such as __init__() and __repr__().

A data class is a class whose main purpose is to store data without functionality. This kind of class, also known as data structure, is very common. For example, a class used to store the coordinates of a point is simply a class with 3 fields (x, y, z).

However, we often need to add a constructor, a representation method, a comparison function, etc. These functions are cumbersome, and this is precisely what should be handled transparently by the language.

As a matter of fact, some languages, such as Kotlin, already offers an easy way to create data classes. In Java this can be done using the Lombok library and its @Data annotation.

Example

Here's an example of use of @dataclass:

By default, this will auto-generate the functions needed to instantiate, compare and print the data class instances.

In other words, this is equivalent to:

Note that this particular example could also be done using namedtuple, but the syntax is more complex to understand, even if it is shorter:

from collections import namedtuple

Point = namedtuple('Point', ['x', 'y', 'z'], defaults=(0.0,))

dataclass Parameters

The @dataclass decorator accepts a list of parameters to control which methods should be generated:

@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False
  • init: if True, generates the __init__ method.
  • repr: if True, generates the __repr__ method.
  • eq: if True, generates the __eq__ method by comparing the fields as they were tuples.
  • order: if True, generates the __lt__, __le__, __gt__, and __ge__ methods.
  • unsafe_hash: if False, generates the __hash__ method depending on the values of eq and frozen. If True, the __hash__ function will be generated.
  • frozen: if True, then the instances will be immutable (read-only).

See the documentation for more information.

Field-specific configuration

In the dataclasses module, there's a field function that allows to provide field-specific configuration:

This allows to control the default value, whether it should be displayed by the __repr__ method, ignored by the comparison functions, included in the __hash__ method, etc.

def field(*, default=MISSING, default_factory=MISSING, repr=True,
          hash=None, init=True, compare=True, metadata=None)

See the documentation for more information.

Post-init processing

The generated __init__() code will call a method named __post_init__(). This is useful to initialize a variable based on the values of other variables. Note that if no __init__ method is generated, then __post_init__ will not be called.

Other Dataclasses Functions

The dataclasses module also provide a bunch of useful functions:

  • fields: return a tuple of Field objects. A Field object contains the configuration of a field.
  • asdict: converts an instance of data class to a dict of its fields.
  • astuple: converts an instance of data class to a tuple of its fields.
  • make_dataclass: creates a new data class dynamically.
  • replace: clone the given data class instance and modify some fields.
  • is_dataclass: tells whether the given object is an instance of a data class.

References

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content