Testing Guidelines for Data Developers - Integration Testing

Testing Guidelines for Data Developers - Integration Testing

Tags
tutorial
python
dev
data
date
Apr 25, 2022

The work of data engineers and data scientist involves writing a lot of code. As this code grows, the cost of maintain it increases exponentially. The software industry has developed patterns and best practices during decades to overcome or mitigate this costs. As the Data Developer role is very new, we must find the way to adopt these patterns in our daily development work, with the necessary adaptations for our use case.
In this series I wanted to give a brief and fundamental approach into some of the principles that can help us develop better code. For this chapter we will focus on the basic guidelines for testing or data code.

The importance of tests

Maybe you need a reminder of why do we need to write tests in the first place?
Do changes with high confidence: even if writing tests might look like slowing us down in the first place, it will accelerate us in the long term. Maintaining a good test suite have a compounding effect in our code quality and speed of development. It will allow us to detect bugs during development, fix issues faster, and facilitate refactoring. Good tests are also a form of documentation for the expected behavior of our code.
But moreover, writing tests have an amazing side effect: promote better design. Forcing you to think on testing the code, naturally encourage the code base to be modular, reusable, and apply software patterns easier. We will see examples of this later on.

Unit Testing

Simple functions asserts

Data Transformations

Use fixtures!

Pandera schemas

Integration Testing

How we define integration testing?

Local Integration Tests Strategies

Remote Integration Tests Strategies

Machine Learning Testing