Pydantic is a data validation and settings/config management library for python. It makes sense to use pydantic when we have data schemas in our codebase (called "models" in pydantic terms) such as a POJO like class in python. But if we're simply trying to validate types in our functions etc, we can just use mypy.

python -m pip install "pydantic[email]" - This is how to install pydantic with optional dependencies.

python -m pip install pydantic-settings - This is the sibling library that is used for config management.

Whats the difference between Pydantic and python dataclasses

Pydantic’s primary way of defining data schemas is through models. A Pydantic model is an object, similar to a Python dataclass, that defines and stores data about an entity with annotated fields (annotated fields are the ones with type hints such as name: str). Unlike dataclasses, Pydantic’s focus is centered around automatic data parsing, validation, and serialization.

See this repo for demo code and setup for using pydantic + mypy + pre-commit. Also this.

How to use Pydantic’s BaseModel to validate and serialize your data

Below is an example to create pydantic models

from uuid import UUID, uuid4
from pydantic import BaseModel

class Employee(BaseModel):
    employee_id: UUID = uuid4()
    name: str
    date_of_birth: date
    salary: float
    elected_benefits: bool

Pydantic validates the fields when an Employee object is instantiated. Pydantic successfully validates and coerces the fields you passed in, and it creates a valid Employee object.

Something to note here is that employee_id, name etc are all instance variables and not class variables even though it looks like class variables.

If you wanted to define a true class-level variable (i.e., something not meant to be part of the model’s data), you’d do this:

from typing import ClassVar

class MyModel(BaseModel):
    name: str
    description: ClassVar[str] = "This is a model"

Now, you can do the below to instantiate an Employee object from a dictionary.

new_employee_dict = {
    "name": "Chris DeTuma",
    "date_of_birth": "1998-04-02",
    "salary": 123_000.00,
    "elected_benefits": True,
}
another_chris = Employee.model_validate(new_employee_dict)

You can do the same thing with JSON objects using .model_validate_json() - see this

This is one of the reasons why FastAPI relies on Pydantic to create REST APIs.

You can also serialize Pydantic models as dictionaries and JSON - see this

Create a JSON schema from your Employee model - see this

More customized data validation

The Field class allows you to customize and add metadata to your model’s fields.

from datetime import date
from uuid import UUID, uuid4
from pydantic import BaseModel, EmailStr, Field

class Employee(BaseModel):
    employee_id: UUID = Field(default_factory=uuid4, frozen=True)
    name: str = Field(min_length=1, frozen=True)
    email: EmailStr = Field(pattern=r".+@example\.com$")
    date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
    salary: float = Field(alias="compensation", gt=0, repr=False)
    elected_benefits: bool

Read this to learn more about what frozen, alias etc in Field mean