Dimension Table
Ø Dimension
tables contain textual information that represents the attributes of the
business
Ø Contain
relatively static data
Ø Dimension
tables are joined to a fact able through foreign key reference
Dimension Table Examples
Ø Retail
– store name, zip code, product name, product category, day of week
Ø Telecommunications -- call origin, call
destination
Ø Banking –customer name, account number,
branch, account officer
Ø Insurance –policy type, insured party
FACT TABLE
Ø Contain
numerical metrics of the business
Ø Can hold large volumes of data
Ø Can grow quickly
Fact Table Examples
Ø Retail
– number of units sold, sales amount
Ø Telecommunications –length of call in minutes,
average number of calls
Ø Banking –average monthly balance
Ø Insurance –claims amount
Types of Schemas
1.
Star Schema
2.
Snow Flake Schema
3.
Galaxy Schema
4.
Fact Constellation Schema
1-STAR SCHEMA
A star schema is the one in
which a central fact table is surrounded by denormalized dimensional tables. A
star schema can be simple or complex. A simple star schema consists of one fact
table where as a complex star schema have more than one fact table
2- SNOWFLAKE SCHEMA
A snow flake schema is an
enhancement of star schema by adding additional dimensions. Snow flake schema
are useful when there are low cardinality attributes in the dimensions
3- GALAXY SCHEMA
Galaxy schema contains many
fact tables with some common dimensions (conformed dimensions). This schema is
a combination of many data marts.
4- FACT CONSTELLATION SCHEMA
The dimensions in this
schema are segregated into independent dimensions based on the levels of
hierarchy. For example, if geography has five levels of hierarchy like
teritary, region, country, state and city; constellation schema would
have five dimensions instead of one.
CONFIRMED DIMENSIONS
An Dimension table which is
shared across Data Mart’s or more than 1 Fact table
Example
Calendar/Date/Time –Dimension
Customer Dimension
SURROGATE KEYS
Ø It
has no meaning, other than stating uniqueness for each record stored in the
dimension tables.
Ø Will
be used in all dimension tables.
Ø It is a just an Sequence No.
Ø Advantage of Surrogate keys include:
-- Control over data
-- Reduced fact table size
Ø Avoid
using the OLTP keys as data warehouse keys.
SLOWLY CHANGING DIMENSIONS
SCD captures the changes
which takes place over the period of time.
SCD Type 1 : Type 1 dimension keeps only the current
values. Doesn’t maintain history
2. SCD Type 2:
Type 2 dimension maintain the full history in the target. For each
update it inserts a new record in the target tables.
3. SCD Type 3 : Type 3 dimension maintains current and
previous information (Partial History)
Great Article
ReplyDeleteData Mining Projects
Python Training in Chennai
Project Centers in Chennai
Python Training in Chennai