The SCD’s – Slowly Changing Dimensions

Slowly Changing Dimensions (SCDs) are an essential concept in data warehousing and business intelligence. They refer to how data that changes slowly over time is managed within a data warehouse to ensure historical accuracy and consistency. Let’s dive into the different types of SCDs and illustrate each with an example.

Types of Slowly Changing Dimensions

There are several types of SCDs, each handling changes in data differently:

  1. Type 0 SCD (Fixed Dimension)
  2. Type 1 SCD (Overwrite)
  3. Type 2 SCD (Add New Row)
  4. Type 3 SCD (Add New Column)
  5. Type 4 SCD (Historical Table)
  6. Type 6 SCD (Hybrid Method)

Type 0 SCD: Fixed Dimension

Definition: Type 0 SCD does not track changes at all. Once a record is inserted, it remains fixed and unchanged.

Use Case: Suitable for data that should remain constant and not change over time.

Example: Historical events or product launch dates that do not change.

Type 1 SCD: Overwrite

Definition: Changes overwrite the existing data. Historical data is not preserved.

Use Case: Suitable for correcting errors or updating information where historical accuracy is not important.

Example: Imagine a company has a customer table, and they use Type 1 SCD to manage changes. Here’s the initial state of the table:

CustomerIDNameCityStateStartDate
1Shraddha ShethPuneMH2020-01-01
2Jane DoeSuratGJ2019-03-12
Initial State of Table

If Shraddha Sheth moves from MH to GJ, the table is updated as follows:

CustomerIDNameCityStateStartDate
1Shraddha ShethAhmedabadGJ2020-01-01
2Jane DoeSurat GJ2019-03-12
Updated Table

In this update, Shraddha Sheth’s city and state are overwritten with the new values, and the original information is lost.

Type 2 SCD: Add New Row

Definition: A new row is added for each change, preserving the historical data. Each row represents a version of the data with start and end dates.

Use Case: Ideal for tracking historical changes for analysis and reporting.

Example Scenario: Using the same customer table, initially:

CustomerIDNameCityStateStartDateEndDateCurrentFlag
1Shraddha ShethPuneMH2020-01-01NULLY
2Jane DoeSuratGJ2019-03-12NULLY
Initial Table State

If Shraddha Sheth moves from MH to GJ, the table is updated to:

CustomerIDNameCityStateStartDateEndDateCurrentFlag
1Shraddha ShethPuneMH2020-01-012022-05-15N
3
Shraddha Sheth
AhmedabadGJ2022-05-16NULLY
2Jane DoeSuratGJ2019-03-12NULLY
Updated Table

A new row is added for Shraddha Sheth’s new address, marking the previous record as inactive with an end date and setting the new record as active with the current flag.

Type 3 SCD: Add New Column

Definition: Changes are tracked by adding new columns to store both current and previous values.

Use Case: Suitable for maintaining limited history where only the previous state is important.

Example:

Initially, the customer table looks like this:

CustomerIDNameCurrentCityCurrentStatePreviousCityPreviousStateLastUpdate
1Shraddha ShethPuneMHNULLNULL2020-01-01
2Jane DoeSuratGJNULLNULL2019-03-12
Initial Table State

After Shraddha Sheth moves to GJ, the table is updated as follows:

CustomerIDNameCurrentCityCurrentStatePreviousCityPreviousStateLastUpdate
1Shraddha ShethAhmedabadGJPuneMH2022-05-16
2Jane DoeSuratGJNULLNULL2019-03-12
Updated Table

The “CurrentCity” and “CurrentState” fields are updated to the new values, while “PreviousCity” and “PreviousState” retain the old values. The “LastUpdate” field shows when the change occurred.

Type 4 SCD: Historical Table

Definition: A separate historical table is maintained to track changes. The current table holds the latest data, while the historical table holds all changes.

Use Case: Useful when the volume of changes is high and detailed historical tracking is needed without cluttering the current table.

Example: Customer table holds the latest information, while a customer history table records all changes. Let’s consider an example of a customer relationship management (CRM) system for a retail company. The company wants to track changes in customers’ addresses over time while maintaining a clean and efficient current customer table.

In the initial setup, there will be two different tables managed to accommodate the requirements. (1) Customers table, (2) Customer_History Table

Customers:

CustomerIDNameCurrentAddressCurrentCityCurrentStateCurrentZipStartDateLastUpdated
1Shraddha Shethlorem ipsumPuneMH4120142020-01-012022-05-16
2Jane Doelorem ipsumSuratGJ3950172019-03-122022-05-16
Customers Table

Customers_History:

HistoryIDCustomerIDNameAddressCityStateZipStartDateEndDate
11Shraddha Shehlorem ipsumPuneMH100012020-01-012022-05-15
21Shraddha Shethlorem ipsumAhmedabadGJ3800062022-05-162023-03-20
32Jane Doelorem ipsumSuratGJ3950172019-03-12NULL
Customer History Table

Type 6 SCD: Hybrid (Types 1, 2, and 3 combined)

Definition: Combines Type 1, Type 2, and Type 3 methods. It uses new rows to preserve history (Type 2), new columns to keep some history (Type 3), and updates the existing data for corrections (Type 1).

Use Case: Suitable for complex scenarios where multiple types of change tracking are required.

Example: Managing a customer table where name changes overwrite the existing name (Type 1), address changes add a new row (Type 2), and a new column tracks the previous loyalty status (Type 3).

Leave a comment