Thursday, November 9, 2017

ER Modeling vs Dimensional Modeling


Star Schema & Snowflakes Schema

qStar Schema
vThere will be a centrally located fact table which is surrounded by one or many dimension table. This  design looks like a star so its called as Star Schema
vDimensions in Star Schema are De-normalized

vBenefit of star schema is Query performance will be very high as there will not be joins and only disadvantage is it requires more disk space

qSnowflakes Schema
vThere will be a centrally located fact table which is surrounded by one or many dimension table and dimension table splitted into more than one table
vDimensions in Snowflakes Schema are Normalized(more joins)
vBenefit of snowflakes schema is overcome disk space issues 


Types of Facts

qTypes of Facts

vAdditive: All the KPIs are group by
  Eg: Sales Fact Quantity like sum(),avg() etc

vSemi Additive: Some KPIs are group by
  Eg: Account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average function might be useful.

vNon_Additive: Non-additive measures are those which can not be used inside any numeric aggregation function like sum(), Avg()
  Eg: Ratio , percentage


vFact_less_fact: Does not contains any measures only KPIs

KPI: Key performance Indicator

Types of Dimensions




qTypes of Dimensions-I
vConfirmed: Shared by two or more fact tables
        Eg: The date dimension table connected to the sales facts is identical to the date dimension connected to the inventory facts.

vJunk: A junk dimension is a collection of random transactional codes it gives convenient place to store  Junk Attributes
  Eg: instead of using two different key Combine gender and marital dimension and create one junk dimension and use that key in the fact table .

vDe-generated : Derived from Fact table. Does not have it own dimension
  Eg: Transational Code in a fact table(Invoice)

vRole_playing: Dimension often used for multiple purpose and can be joined to more than one foreign key in the fact table.
  Eg: fact table may include foreign keys for both ship date and delivery date

vInferred: Solution to not yet ready dimension.
  Eg: Generate a surrogate Key as null for other attributes 

vStatic : Not extracted from the original data source, but are created within the context of the data warehouse. A static dimension can be loaded manually
  Eg: status codes or it can be generated by a procedure, such as a date or time dimension.

vRapidly Changing Dimension(RCD):Changes frequently. lot of changes happens in a attributes of a dimension table

vSlowly Changing Dimension(SCD):Changes slowly over a period of time
Types:
SCD1: To every insert in source it will insert into target and to every update in the source it will update in the target. Contains only current records. override old value
SCD2: To every insert in source it will insert into target and to every update in the source it will insert in the target with different Surrogate key. It contains historical record.(It can be implemented using flagging ,versioning or date range methods).it add a new row.
SCD3: It contains partially historical records(like last 2 or 3 history).it adds a new row

What's Fact & Dimension tables?

qWhat is Dimension Table?
vDimension contains descriptive attributes(data) about a subject.
vThe table which describe the dimension called Dimension table.
vA dimension table has a primary key column that uniquely identifies each dimension record (row). 
vE,g: ‘Company’ is a subject Employee, Department and date is Object(dimension tables) .ID,Name,Dept Nm, joining Dt , Release Dt are attributes(columns).

qWhat is Fact Table?
vFact is something that is measurable. It contains only numeric value which can be aggregated.
vA fact table contains information about KPIs and measure.
vKPI stands for Key Performance Indicator which is nothing but Dimension tables Primary Key
vMeasures are data elements that can be summed, averaged, or mathematically manipulated
 Example:


Above customer ,store ,product and time tables are dimension tables which contain attributes
Purchase is the fact table which contains all PKs of dimension tables and price and tax column which are measures.



What is ETL?

qWhat is ETL?
vStands for Extract Transform Load. It Extracts data from multiple data sources ,transform( filter, aggregate, etc) according to the business rules and then load it into target(ODS/DWH).

vETL Tools: Informatica Power Center ,Data stage, Oracle Warehouse Builder, SQL Server Integration Services, Data Migrator(IBI)


What is ODS(Operational Data Store) & Staging area ?

q What is ODS?
vStands for Operational data store
vContains current data for a limited period of time.(DWH stores periodic data).
vShould be refreshed daily to get current data(daily or hourly)
vUse to monitor the business.
vSits between Staging and DWH

q What is Staging Area?
vParking area used to dump  and integrate data from heterogeneous source systems

vUse to cleanse data, apply multiple business rules, filter it and then push it to your Data Warehouse /ODS

ER Modeling vs Dimensional Modeling