Slowly Changing Dimensions (SCDs) are an essential concept in data warehousing and business intelligence. They refer to how data that changes slowly over time is managed within a data warehouse to ensure historical accuracy and consistency. Let’s dive into the different types of SCDs and illustrate each with an example.
Types of Slowly Changing Dimensions
There are several types of SCDs, each handling changes in data differently:
- Type 0 SCD (Fixed Dimension)
- Type 1 SCD (Overwrite)
- Type 2 SCD (Add New Row)
- Type 3 SCD (Add New Column)
- Type 4 SCD (Historical Table)
- Type 6 SCD (Hybrid Method)
Type 0 SCD: Fixed Dimension
Definition: Type 0 SCD does not track changes at all. Once a record is inserted, it remains fixed and unchanged.
Use Case: Suitable for data that should remain constant and not change over time.
Example: Historical events or product launch dates that do not change.
Type 1 SCD: Overwrite
Definition: Changes overwrite the existing data. Historical data is not preserved.
Use Case: Suitable for correcting errors or updating information where historical accuracy is not important.
Example: Imagine a company has a customer table, and they use Type 1 SCD to manage changes. Here’s the initial state of the table:
| CustomerID | Name | City | State | StartDate |
|---|---|---|---|---|
| 1 | Shraddha Sheth | Pune | MH | 2020-01-01 |
| 2 | Jane Doe | Surat | GJ | 2019-03-12 |
If Shraddha Sheth moves from MH to GJ, the table is updated as follows:
| CustomerID | Name | City | State | StartDate |
|---|---|---|---|---|
| 1 | Shraddha Sheth | Ahmedabad | GJ | 2020-01-01 |
| 2 | Jane Doe | Surat | GJ | 2019-03-12 |
In this update, Shraddha Sheth’s city and state are overwritten with the new values, and the original information is lost.
Type 2 SCD: Add New Row
Definition: A new row is added for each change, preserving the historical data. Each row represents a version of the data with start and end dates.
Use Case: Ideal for tracking historical changes for analysis and reporting.
Example Scenario: Using the same customer table, initially:
| CustomerID | Name | City | State | StartDate | EndDate | CurrentFlag |
|---|---|---|---|---|---|---|
| 1 | Shraddha Sheth | Pune | MH | 2020-01-01 | NULL | Y |
| 2 | Jane Doe | Surat | GJ | 2019-03-12 | NULL | Y |
If Shraddha Sheth moves from MH to GJ, the table is updated to:
| CustomerID | Name | City | State | StartDate | EndDate | CurrentFlag |
|---|---|---|---|---|---|---|
| 1 | Shraddha Sheth | Pune | MH | 2020-01-01 | 2022-05-15 | N |
| 3 | Shraddha Sheth | Ahmedabad | GJ | 2022-05-16 | NULL | Y |
| 2 | Jane Doe | Surat | GJ | 2019-03-12 | NULL | Y |
A new row is added for Shraddha Sheth’s new address, marking the previous record as inactive with an end date and setting the new record as active with the current flag.
Type 3 SCD: Add New Column
Definition: Changes are tracked by adding new columns to store both current and previous values.
Use Case: Suitable for maintaining limited history where only the previous state is important.
Example:
Initially, the customer table looks like this:
| CustomerID | Name | CurrentCity | CurrentState | PreviousCity | PreviousState | LastUpdate |
|---|---|---|---|---|---|---|
| 1 | Shraddha Sheth | Pune | MH | NULL | NULL | 2020-01-01 |
| 2 | Jane Doe | Surat | GJ | NULL | NULL | 2019-03-12 |
After Shraddha Sheth moves to GJ, the table is updated as follows:
| CustomerID | Name | CurrentCity | CurrentState | PreviousCity | PreviousState | LastUpdate |
|---|---|---|---|---|---|---|
| 1 | Shraddha Sheth | Ahmedabad | GJ | Pune | MH | 2022-05-16 |
| 2 | Jane Doe | Surat | GJ | NULL | NULL | 2019-03-12 |
The “CurrentCity” and “CurrentState” fields are updated to the new values, while “PreviousCity” and “PreviousState” retain the old values. The “LastUpdate” field shows when the change occurred.
Type 4 SCD: Historical Table
Definition: A separate historical table is maintained to track changes. The current table holds the latest data, while the historical table holds all changes.
Use Case: Useful when the volume of changes is high and detailed historical tracking is needed without cluttering the current table.
Example: Customer table holds the latest information, while a customer history table records all changes. Let’s consider an example of a customer relationship management (CRM) system for a retail company. The company wants to track changes in customers’ addresses over time while maintaining a clean and efficient current customer table.
In the initial setup, there will be two different tables managed to accommodate the requirements. (1) Customers table, (2) Customer_History Table
Customers:
| CustomerID | Name | CurrentAddress | CurrentCity | CurrentState | CurrentZip | StartDate | LastUpdated |
|---|---|---|---|---|---|---|---|
| 1 | Shraddha Sheth | lorem ipsum | Pune | MH | 412014 | 2020-01-01 | 2022-05-16 |
| 2 | Jane Doe | lorem ipsum | Surat | GJ | 395017 | 2019-03-12 | 2022-05-16 |
Customers_History:
| HistoryID | CustomerID | Name | Address | City | State | Zip | StartDate | EndDate |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | Shraddha Sheh | lorem ipsum | Pune | MH | 10001 | 2020-01-01 | 2022-05-15 |
| 2 | 1 | Shraddha Sheth | lorem ipsum | Ahmedabad | GJ | 380006 | 2022-05-16 | 2023-03-20 |
| 3 | 2 | Jane Doe | lorem ipsum | Surat | GJ | 395017 | 2019-03-12 | NULL |
Type 6 SCD: Hybrid (Types 1, 2, and 3 combined)
Definition: Combines Type 1, Type 2, and Type 3 methods. It uses new rows to preserve history (Type 2), new columns to keep some history (Type 3), and updates the existing data for corrections (Type 1).
Use Case: Suitable for complex scenarios where multiple types of change tracking are required.
Example: Managing a customer table where name changes overwrite the existing name (Type 1), address changes add a new row (Type 2), and a new column tracks the previous loyalty status (Type 3).