Track
Dimensional Changes Over Time - SCD
Ø Operational systems tend to contain data
about the current state of the business.
Ø A data warehouse is expected to hold
data for five to 10 years.
Ø Users may need to query data as of any
particular date (for example, at which office(s) did Mary Smith work between
January/1999 and December/1999?).
Ø Data warehouse dimensional data often
must show how the dimensional data changes over time. It is not static.
Ø The term Slowly Changing Dimension (SCD)
refers to the tracking of changes to dimensional data over time.
Slowly
Changing Dimensions
There are in general three ways to solve this
type of problem, and they are categorized as follows:
Ø SCD
Type 1:
The
new record replaces the original record. No trace of the old record exists.
Ø SCD Type 2:
A
new record is added into the dimension table.
Ø SCD Type 3:
The
original record is modified to reflect the change.
SCD-Type
1 Changes
Ø Usually, the changes relate to
correction of errors in the source system
Ø Sometimes the change in the source
system has no significance
Ø The old value in the source system needs
to be discarded
Ø The change in the source system need not
be preserved in the DWH
Applying
Type 1 changes
Ø Overwrite the attribute value in the
dimension table row with the new value
Ø The old value of the attribute is not
preserved
Ø No other changes are made in the
dimension table row
Ø The key of this dimension table or any
other key values are not affected
Ø Easiest to implement
SCD Type -2 Changes
Ø Let’s look at the martial
status of Miky Schreiber
Ø One the DWH’s requirements is
to track claims by martial status (in addition to other attributes)
Ø All changes before
11/11/2005 will be under Martial Status = Single, and all changes after that
date will be under Martial Status = Married
Ø We need to aggregate the
orders before and after the marriage separately
Ø Let’s add one more change:
Ø Miky is living in Vine st.,
but on 9/30/2009 he moves to Sunset Blvd.
SCD Type
2 Changes, continued….
General
Principles for Type 2 changes:
Ø They usually relate to true
changes in source systems
Ø There is a need to preserve
history in the DWH
Ø This type of change
partitions the history in the DWH
Ø Every change for the same
attributes must be preserved
SCD Applying
Type 2 changes
Ø Add a new dimension table
row with the new value of the changed attribute
Ø An effective date will be
included in the dimension table
Ø There are no changes to the
original row in the dimension table
Ø The key of the original row
is not affected
Ø The new row is inserted with
a new surrogate key
SCD- Type
3 Changes
General
Principles:
Ø They usually relate to
“soft” or tentative changes in the source systems
Ø There is a need to keep
track of history with old and new values of the changes attribute
Ø They are used to compare
performances across the transition
Ø They provide the ability to
track forward and backward
SCD Applying Type 3 changes
Ø No
new dimension row is needed
Ø The
existing queries will seamlessly switch to the current value
Ø Any
queries that need to use the old value must be revised accordingly
Ø The
technique works best for one soft change at a time
Ø If
there is a succession of changes, more sophisticated techniques must be advised
Ø The
number of changes to be preserved is identified in advance and stored in fields
rather than rows.
Great Article
ReplyDeleteData Mining Projects
Python Training in Chennai
Project Centers in Chennai
Python Training in Chennai