Trace Your Process: Ensuring Repeatability in Python#

Repeatability (or reproducibility) is a core pillar in scientific research, software engineering, and data analytics. When you write Python code, you typically want to ensure that others can run it—on different machines, with different operating systems—and arrive at the same results. Without a strategy for repeatability, you risk spending countless hours debugging subtle differences in dependencies, environment variables, library versions, and random seeds. This blog post will take you on a journey: from the very basics of reproducing results with Python’s foundational tools to advanced techniques for professionals. Whether you’re a newcomer or a seasoned developer, by the end of this guide, you’ll be equipped with the knowledge to trace your process effectively and consistently reproduce it whenever necessary.

Table of Contents#

Understanding the Importance of Repeatability
Foundational Elements: Virtual Environments and Dependencies
Configuring Randomness for Reproducibility
Code Organization and Structure
1. Modular Design
2. Documenting Your Code
Logging and Debugging: Tracing Execution Steps
1. Built-in Logging
2. Debugging with pdb and Breakpoints
Testing for Consistency
1. Why Testing Is Crucial for Repeatability
2. Using pytest and coverage
Version Control Strategies
Advanced Environment Management
Data and Model Versioning
Case Study: Reproducible Machine Learning Pipeline
Conclusion

Understanding the Importance of Repeatability#

Imagine you’ve built a clever machine learning model. It performs exceedingly well, consistently beating benchmarks. Then your colleague tries to run your project and gets a different (and worse) result. Or, one day you try to re-run the same code on a new computer and suddenly your script refuses to compile, or worse, returns silently incorrect answers. This scenario can happen if you haven’t nailed down a process for repeatability.

Some reasons why ensuring repeatability is critical:

Collaboration: Other team members can confidently run your code.
Quality Control: You can isolate changes in performance or functionality to specific factors like data updates or code modifications, rather than unknown environment quirks.
Reliability in Deployment: Production environments can trust that the same behavior developed locally will be replicated on servers.
Compliance and Transparency: In fields like finance or healthcare, reproducible code is not just a best practice—it’s often a regulatory requirement.

Repeatability is about controlling as many variables as possible: library versions, configuration files, random seed generation, data transformations, OS dependencies, and more. By the end of this guide, you’ll have the tools to address each of these variables systematically.

Foundational Elements: Virtual Environments and Dependencies#

Why Virtual Environments?#

A virtual environment in Python isolates your development project’s libraries, ensuring you can install and upgrade dependencies without affecting the global Python interpreter. By using virtual environments:

Prevent Conflicts: Different projects can have different (and sometimes incompatible) versions of the same library.
Portability: Your project can be packaged with its own environment specification for easy setup on new machines.
Stability: You can freeze versions of dependencies to avoid unexpected upgrades that might break functionality.

Options for Creating Virtual Environments#

There are several ways to create virtual environments in Python. Let’s look at three common tools:

Tool	Description	Installation or Usage
venv	Standard library module (Python>=3.3) for creating environments.	In Python 3, run: `python -m venv venv_name`
virtualenv	Pre-dates venv; often used for older workflows.	`pip install virtualenv`, then `virtualenv venv_name`
Anaconda/Conda	A Python distribution focused on data science, includes environment management.	`conda create —name env_name python=3.8`

Venv is the simplest and lightweight option if you just need a clean environment, while Conda excels when you have complex numeric or scientific dependencies.

Using Requirements Files for Dependencies#

Once you’re inside your virtual environment, you can install dependencies with pip or conda. To ensure repeatability, it’s best to record the exact versions of all libraries. In a typical pure-pip workflow, you use a requirements.txt or requirements.in file:

Install dependencies locally:

1
pip install numpy==1.21.0 scikit-learn==1.0

Generate a requirements file:
Terminal window
```
1
pip freeze > requirements.txt
```
Share or commit the file: Anyone can then replicate your setup by running:
Terminal window
```
1
pip install -r requirements.txt
```

When you run pip freeze, it captures the exact versions of all installed packages. This becomes paramount for ensuring your code runs the same way everywhere.

Configuring Randomness for Reproducibility#

Randomness can be the subtle factor that breaks repeatability. Whether you’re randomizing data splits, performing Monte Carlo simulations, or training machine learning models, different seeds can lead to different outcomes.

Random Seeds in Python’s random Module#

Python’s built-in random module seeds the pseudorandom number generator (PRNG). If you set the same seed and run the same sequence of random function calls, you will get the same results:

1
import random
2

3
def generate_lottery_numbers():
4
    random.seed(42)  # Setting a fixed seed
5
    return [random.randint(1, 49) for _ in range(6)]
6

7
numbers_run1 = generate_lottery_numbers()
8
numbers_run2 = generate_lottery_numbers()
9

10
print(numbers_run1)  # Output: [41, 47, 43, 1, 14, 17]
11
print(numbers_run2)  # Output: [41, 47, 43, 1, 14, 17]

Because we fixed the seed to 42, both calls return the same list. If you rely on random for major steps in your application (like data shuffling or random sampling), setting a global seed—at least for debugging or demonstration—enhances reproducibility.

NumPy Random Seeds#

NumPy has its own random number generator that doesn’t share state with Python’s random. So, if you use both modules, you should seed them separately:

1
import numpy as np
2

3
def random_matrix():
4
    np.random.seed(123)
5
    return np.random.rand(3, 3)
6

7
matrix_run1 = random_matrix()
8
matrix_run2 = random_matrix()
9

10
print(matrix_run1)
11
print(matrix_run2)

Each call yields the same matrix in matrix_run1 and matrix_run2 because we set the seed to 123 inside the function.

PyTorch, TensorFlow, and Beyond#

Machine learning frameworks like PyTorch and TensorFlow introduce their own complexity. For example, PyTorch uses torch.manual_seed(seed_value), and TensorFlow uses tf.random.set_seed(seed_value). Additionally, if you use GPU-based operations, you might get nondeterministic results unless you specify particular flags or environment variables. Always consult the framework’s documentation on reproducibility when consistency across runs is required.

Code Organization and Structure#

Even if your environment is meticulously isolated and your random seeds are set, if your codebase is messy or badly structured, reproducibility can still suffer. Spaghetti code is prone to hidden state, side effects, and confusion.

Modular Design#

One application of modular design is splitting your Python code into logical sections: data loading, data processing/cleaning, modeling or analysis, and result reporting. Maintain separate modules—each with clear inputs and outputs—that can be tested independently. Here is a simplified directory structure:

1
my_project/
2
    data/
3
        raw/
4
        processed/
5
    src/
6
        __init__.py
7
        data_utils.py
8
        model.py
9
    notebooks/
10
        exploration.ipynb
11
    requirements.txt
12
    run_experiment.py

data_utils.py: responsible for reading and writing data.
model.py: includes model definition, training functions, or inference code.
run_experiment.py: orchestrates the pipeline, potentially with command-line arguments for different scenarios.

Documenting Your Code#

Document your functions and modules. Even short docstrings can clarify what a function expects (parameters) and what it returns (return values). For instance:

1
def load_data(filepath: str) -> pd.DataFrame:
2
    """
3
    Loads data from the specified CSV file.
4

5
    :param filepath: Path to the CSV file.
6
    :return: A pandas DataFrame containing the data.
7
    """
8
    return pd.read_csv(filepath)

Later, when future you or a colleague tries to replicate your workflow, they’ll have an immediate reference to the data flow. Docstrings serve as an internal guide that pairs well with your repeatability strategy.

Logging and Debugging: Tracing Execution Steps#

One of the best ways to get traceable code is by using logs. Logs record the series of events that occur when your program runs. They help you identify the cause of anomalies in your workflow and confirm your code is hitting the expected steps in the correct sequence—including the environment details, variable states, and more.

Built-in Logging#

Python’s built-in logging module is straightforward yet powerful. With a few lines of code, you can record timestamps, severity levels, messages, and more:

1
import logging
2

3
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
4

5
def train_model():
6
    logging.info("Starting model training...")
7
    # training steps here
8
    logging.info("Model training complete.")
9

10
train_model()

Logs can be written to a file instead of standard output:

1
logging.basicConfig(
2
    filename='training.log',
3
    level=logging.INFO,
4
    format='%(asctime)s %(levelname)s %(message)s'
5
)

For more advanced use cases, you may configure multiple loggers with different handlers (e.g., console and file, each with different formatting).

Debugging with pdb and Breakpoints#

For interactive debugging, Python’s built-in pdb debugger helps you step through code line by line. In Python 3.7+, you can also insert breakpoint() anywhere in your code. When the interpreter hits that line, you’ll enter an interactive debug session:

1
def complicated_function(x):
2
    breakpoint()  # Execution will pause here
3
    # Inspect variables, run commands, step through
4
    return x ** 2
5

6
result = complicated_function(10)

In a debugging session, commands like n (next line), s (step into function calls), c (continue execution), l (list code) make it easier to identify the exact source of an error or unexpected behavior—streamlining your reproducibility efforts.

Testing for Consistency#

Why Testing Is Crucial for Repeatability#

Automated tests verify that your code produces expected results consistently. As your project grows—either in complexity or by the number of collaborators—automated tests become a safety net for ensuring that changes do not break existing behavior. By gating changes with tests, you effectively enforce ongoing repeatability.

Using pytest and coverage#

Pytest is a popular testing framework known for its simplicity. Creating tests is as simple as creating a test_something.py file. For example:

1
import pytest
2

3
def add(a, b):
4
    return a + b
5

6
def test_add_integers():
7
    assert add(2, 3) == 5
8

9
def test_add_strings():
10
    assert add("Hello", "World") == "HelloWorld"

Run it by typing:

1
pytest

When you run pytest, it will automatically detect test files and test functions following the test_*.py naming convention. You can also measure code coverage with:

1
pip install coverage
2
coverage run -m pytest
3
coverage report -m

This gives you an overview of which parts of the code are tested, letting you identify untested paths that could hide non-reproducible behaviors.

Version Control Strategies#

Version control systems—like Git—are integral to reproducibility. They store historical versions of your code, allowing you to pinpoint when and how things changed. Some best practices:

Commit small and often, with clear messages.
Use tags or releases to mark known working states.
If your environment is stable, commit your requirements.txt or environment files concurrently.
If you’re storing large datasets, look into data versioning tools (see the next section).

Branches can be used to isolate experimental changes from the mainline code. Once you verify that your experiment is reproducible, it can be merged. This workflow ensures that the main branch is always at a consistent state.

Advanced Environment Management#

Beyond basic venv or pip, other solutions exist for more specialized needs. They can help you specify pinned dependencies, run different Python versions in parallel, or encompass system libraries.

Conda Environments#

Conda is both a package manager and environment manager. It’s well suited for numerical and scientific ecosystems where you might depend on compiled libraries:

1
# Create a new Conda environment
2
conda create --name my_conda_env python=3.9
3

4
# Activate
5
conda activate my_conda_env
6

7
# Install packages
8
conda install numpy pandas scikit-learn

When you’re satisfied:

1
conda env export > environment.yml

Anyone else (or your future self) can replicate the same environment by running:

1
conda env create -f environment.yml

Pipenv and Poetry#

Pipenv integrates Pip and virtualenv into a unified workflow, automatically creating and managing a virtual environment:

1
pip install --user pipenv
2
pipenv install requests==2.25.1

It generates Pipfile and Pipfile.lock, which you commit to version control. Poetry is another popular tool for dependency management and packaging:

1
pip install poetry
2
poetry init  # Creates a pyproject.toml
3
poetry add requests==2.25.1

With Poetry, pinned dependencies are tracked in poetry.lock, ensuring consistency across machines.

Docker for Highly Reproducible Setups#

When you need near total reproducibility—including the operating system—Docker is a superb choice. Docker containers package everything: from the OS to system libraries, environment variables, and your Python environment. A minimal Dockerfile might look like this:

1
FROM python:3.9-slim
2

3
WORKDIR /app
4

5
COPY requirements.txt /app/
6
RUN pip install --no-cache-dir -r requirements.txt
7

8
COPY . /app
9

10
CMD ["python", "run_experiment.py"]

FROM python:3.9-slim starts with a Debian-based image containing Python 3.9.
WORKDIR /app sets /app as the working directory.
COPY requirements.txt /app/ copies your requirements.txt.
RUN pip install --no-cache-dir -r requirements.txt installs dependencies.
COPY . /app copies the rest of your project.
The last line runs your script by default.

Now you can build and run a container:

1
docker build -t my_project .
2
docker run --rm -it my_project

This approach ensures the entire system is captured, which is extremely powerful if you’re distributing or deploying your project. The environment inside the container will be identical for every user or server, providing maximal repeatability.

Data and Model Versioning#

Applications that rely on large datasets or machine learning models can benefit from specialized tools that handle versioning of big files. Storing huge data directly in Git is impractical.

DVC (Data Version Control):
DVC extends Git-like functionality to large data files. You can track dataset changes over time similarly to code.
MLflow:
Manages and version-controls ML models, hyperparameters, and metrics.
Weights & Biases / Neptune / Comet:
Cloud-based experiment tracking solutions, also providing versioning insights for your experiments.

By coupling your code’s version control (Git) with data or model versioning solutions, you guarantee that you can reproduce not just the code but also the exact dataset or trained model used in any particular experiment.

Case Study: Reproducible Machine Learning Pipeline#

Let’s walk through a typical scenario:

Data Preparation: You have a CSV in data/raw/. You run a Python script that cleans and splits the data (using seeds in both random and numpy.random), then saves it into data/processed/.
Model Training: Another script loads data/processed/, sets random seeds for NumPy and PyTorch, trains a neural network, and logs the process.
Saving and Versioning the Model: The trained model is stored in models/. You record the version in Git tags and optionally track it in a data versioning tool.
Deploying: You have a Dockerfile that starts from a Python 3.9 base image, installs dependencies pinned in requirements.txt, and copies your scripts and data.

Example Directory Layout#

1
my_ml_project/
2
    data/
3
        raw/
4
            dataset.csv
5
        processed/
6
            train.csv
7
            test.csv
8
    models/
9
        model_v1.pt
10
    src/
11
        data_clean.py
12
        train_model.py
13
    Dockerfile
14
    requirements.txt
15
    environment.yml

Reproducibility Steps#

Setup: conda env create -f environment.yml or pip install -r requirements.txt
Clean & Split Data: python src/data_clean.py --seed 42
Train Model: python src/train_model.py --seed 42 --save_path models/model_v1.pt
Check Logs: Ensure logs/training.log mentions the same environment details and seed.
Docker (optional): docker build -t my_ml_project . && docker run my_ml_project

Every piece here ensures that when you or a teammate re-run these steps, the results will match—assuming the same environment and seeds.

Conclusion#

Achieving repeatability in Python need not be intimidating. With a strategic approach, you can control the variables that typically cause inconsistencies: package versions, virtual environments, data management, and random seeds. In summary:

Use virtual environments (venv, conda, pipenv, poetry) to isolate your Python dependencies.
Declare exact library versions in a requirements file or lock file.
Seed your random number generators to ensure reproducible pseudorandom processes.
Follow modular design principles and maintain good documentation.
Leverage logging to trace execution and identify potential deviations from expected behavior.
Implement automated tests (e.g., with pytest) to systematically confirm that the code’s behavior remains consistent over time.
Embrace version control (Git) and, where needed, data versioning (DVC, MLflow) so that you can revert to any project state.
For maximum repeatability—especially in production—use Docker to containerize your entire environment.

When you carefully trace your process and lock down your environment, your code transforms into a robust, shareable, and maintainable entity. Everyone on your team benefits, including future you. Above all, the peace of mind you gain—from knowing that your results can be reliably reproduced—often proves invaluable. Happy coding, and may your results remain consistent!