As a thumb rule, “Every Foreign key in a fact table should reference a dimension row”, let’s look at the following example to understand the use of default row in DW
Source File to populate the “Employee Dimension”
Row id | Emp First Name | Emp Last Name | EMP No | EMP Dep |
1 | John | Beck | 2001 | A |
2 | Mike | Morry | 2002 | A |
3 | Kevin | Peter | 2003 | B |
4 | Steve | Morry | 2004 | B |
5 | Jim | Chen | 2005 | C |
As you know from the (DW Loading Techniques) the dimension table gets loaded first with Naturel and surrogate keys.
Here is how the table populated
Employee Dimension | ||||
Emp_Surrogate_Key | EMP No | Emp First Name | Emp Last Name | EMP Dep |
0 | 0 | NULL | NULL | NULL |
100001 | 2001 | John | Beck | A |
100002 | 2002 | Mike | Morry | A |
100003 | 2003 | Kevin | Peter | B |
100004 | 2004 | Steve | Morry | B |
100005 | 2005 | Jim | Chen | C |
Source File to populate the “Employee Fact”
Row id | EMP No | EMP Sal |
1 | 2001 | 1000 |
2 | 2002 | 2000 |
3 | 2003 | 1000 |
4 | 2004 | 2000 |
5 | 2005 | 3000 |
6 | 2006 | 4000 |
The fact table surrogate keys gets populated by doing the lookup against the natural key in the look-up process, as you can see that the “Employee Fact” source has Emp no 2006 which is not available in the dimension table. So during the surrogate key look-up process the fact table loaded with “0” for that employee
Why does it Matter:
Well, as you know the fact and dimension joined by surrogate key,
Select emp_dim* from
Employee Dimension as emp_dim,
Employee Fact as emp_fact
Where
emp_dim. Emp_surrogate_key=emp_fact. Emp_surrogate_key
Here is the result without the default row
Emp_Surrogate_Key | Emp First Name | Emp Last Name | EMP No | EMP Dep | EMP Sal |
100001 | John | Beck | 2001 | A | 1000 |
100002 | Mike | Morry | 2002 | A | 2000 |
100003 | Kevin | Peter | 2003 | B | 1000 |
100004 | Steve | Morry | 2004 | B | 2000 |
100005 | Jim | Chen | 2005 | C | 3000 |
As you can see that the “Emp no 2006” salary information’s is not get displayed in the result
Here is the result with the default row
Emp_Surrogate_Key | Emp First Name | Emp Last Name | EMP No | EMP Dep | EMP Sal |
100001 | John | Beck | 2001 | A | 1000 |
100002 | Mike | Morry | 2002 | A | 2000 |
100003 | Kevin | Peter | 2003 | B | 1000 |
100004 | Steve | Morry | 2004 | B | 2000 |
100005 | Jim | Chen | 2005 | C | 3000 |
0 | NULL | NULL | 2006 | Null | 4000 |
0 comments: