Introduction
Interacting with external resources and integrating software systems is one of the most challenging tasks for a software developer. It is so big in fact that the purpose of a great percentage of all software best practices and testing strategies is to deal with this “integration hell”. Reduce “coupling” and increase “cohesion”.
One of my colleagues, Eduardo Ferro, has a great and concise article about this topic: https://www.eferro.net/2023/04/thin-infrastructure-wrappers.html
On the context of hexagonal architecture, this concept of infrastructure wrappers is translated into “ports and adapters”. With more complexity and implications, but supporting more use cases.
In the context of data application development, I have applied repeatedly and with success a similar pattern to deal with SQL integration and data storage that can be consider: the
DataAccessLayer
class.Problem Statement
Suppose you are developing an application that performs some kind of data transformations or data extraction using a SQL Database. Your production database might be an analytical data warehouse like Redshift or Snowflake. How can you handle the connection to that database in a way that is modular, independent of the database technology, and easily testable?
Part 1: Use existing tools - SQLAlchemy
The first step is to not reinvent the wheel. Let’s leverage one of the biggest and most used python packages: .
sqlalchemy
sqlalchemy • Updated May 15, 2024
This provides us with a very cool abstraction: the db engine. This leaves all the complexity of the database specific configuration and SQL flavour to SQLAlchemy, while the developer only needs to configure the correct connection URI