When it comes to managing data pipelines, testing is the unsung hero that prevents you from inheriting headaches caused by bad data downstream. While DBT offers built-in tests like unique, not_null, and relationships, real-world scenarios often demand more sophisticated testing strategies to handle complex data validation. This blog dives into custom DBT tests that can make your life easier by catching nuanced data issues early in the pipeline.
Why Go Beyond Basic DBT Tests?
The built-in tests in DBT are great for the essentials: ensuring no duplicates in a column (unique), or that a column isn’t unexpectedly empty (not_null). But what if you need to validate that:
- A column of percentages always falls between 0 and 100?
- A specific set of conditions holds true across multiple fields?
- A date column contains only valid dates in the future or past?
That’s where custom tests come into play. They allow you to tailor validation logic that matches the quirks and nuances of your data.
Getting Started: Building a Custom Test
Let’s say we have a table where we store discount codes, and we need to ensure that the discount percentage falls within a valid range (0% to 50%). Instead of hard-coding this validation into the model or dashboard, we can create a reusable custom test that will apply this logic.
Step 1: Create a Custom Test Macro
DBT tests are built using macros (reusable SQL snippets). To create a custom test, define a macro in a .sql file inside the macros/ directory.
-- macros/range_check.sql
{% test range_check(model, column_name, min_value, max_value) %}
select *
from {{ model }}
where {{ column_name }} < {{ min_value }}
or {{ column_name }} > {{ max_value }}
{% endtest %}
Here, the macro accepts:
model: The table being tested.column_name: The column where the test will apply.min_valueandmax_value: The valid range for the column.
Step 2: Apply the Test in Your YAML File
Next, apply the custom test to any column in your project. You can use the test inside the model’s .yml file like this:
version: 2
models:
- name: discount_codes
columns:
- name: discount_percentage
tests:
- range_check:
min_value: 0
max_value: 50
When you run dbt test, this validation will check that all discount percentages in the discount_codes table fall between 0 and 50.
Taking It a Step Further: Conditional Custom Tests
Now, let’s add a bit more complexity. Imagine you need to validate that discounts should only be applied to active promotions. A promotion is “active” if it has a valid start and end date, and the current date falls between these dates.
We can enhance the custom test to include conditional logic.
Step 1: Define the Conditional Test Macro
We will create a new macro that ensures discounts are valid only if they’re applied within the active promotion period.
-- macros/valid_discount_for_active_promo.sql
{% test valid_discount_for_active_promo(model, discount_column, start_date_column, end_date_column) %}
select *
from {{ model }}
where {{ discount_column }} is not null
and (current_date not between {{ start_date_column }} and {{ end_date_column }})
{% endtest %}
This macro will check:
- If the
discount_columnis not null, ensuring discounts are valid. - If the current date is between the start and end dates for the promotion.
Step 2: Apply the Conditional Test
Now, apply this test to your promotion table, using the corresponding columns for start_date and end_date:
version: 2
models:
- name: promotions
columns:
- name: discount_percentage
tests:
- valid_discount_for_active_promo:
start_date_column: start_date
end_date_column: end_date
This will ensure that a promotion with a discount is only considered valid if the promotion is currently active.
Why It Matters
Data pipelines, by nature, are fragile. And bad data can slip through the cracks without robust validation. With custom DBT tests, you can catch issues before they propagate through to analytics and reporting layers. This not only improves data quality but also instills confidence in stakeholders using your data.
By moving beyond the basics and writing flexible, reusable tests, you ensure your data systems are resilient, adaptive, and ready to handle even the most nuanced validation scenarios.
Wrapping Up
DBT’s built-in tests are just the beginning. With custom tests, you can address business-specific validation needs, enforce complex rules, and maintain the health of your data pipelines. Whether it’s ensuring that discount percentages remain reasonable or validating that promotions are active, DBT’s flexible testing framework provides you with all the tools you need to safeguard your data.
By leveraging custom tests, you’re making your data pipelines smarter, more adaptable, and—most importantly—more reliable.