FACTS
- Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.
- • A conformed dimension is a dimension that has exactly the same meaning and content when being referred from different fact tables.
- A conformed dimension can refer to multiple tables in multiple data marts within the same organization.
35. what is aggregate table and aggregate fact table … any examples of both?
- Aggregate table contains summarized data.
- The materialized view are aggregated tables. For ex in sales we have only date transaction. if we want to create a report like sales by product per year. in such cases we aggregate the date vales into week_agg, month_agg, quarter_agg,year_agg. to retrieve date from this tables we use @aggrtegate function.
- Aggregate table contains the [measure] values
- aggregated/grouped/summed up to some level of hierarchy.
contains aggregated data which can be calculated by using different aggregated functions like count,avg,min,max.e.t.c.Aggregated tables are most widely used tables in OLAP database.Aggregate functions are functions where the values of table or column are grouped together and form a single value.
Following are some aggregate functions:
- 1.Average
- 2.Count
- 3.MAX
- 4.MIN
- 5.Median
- 6.SUM
Using the above aggregate functions the data will be inserted in aggregate table.The aggregate tables are used for performance optimization and data is coming fast using aggregate table.Aggregations applied on database level improves the performance of the query as the query not hits directly on table it will hit on aggregate table and fetches data.
Real Example:
If table contains the data of year 2016 and 2017 ,User wants a actual count of records in the table monthly,Quarterly and yearly.Then We need to make aggregate table which contains count of records monthly,For Quarterly we need to create other table and push the quarterly count in that table.And we need to use that tables in the reports so that report performance will improve drastically
36.What is a fact & a fact table?
- Facts represent quantitative data. For example – net amount due is a fact.
- A fact table contains numerical data and foreign keys from related dimensional tables.
- Fact Table contains the measurements or metrics or facts of business process.
- If your business process is “Sales” , then a measurement of this business process such as “monthly sales number” is captured in the Fact table.
- Fact table also contains the foreign keys for the dimension tables.
- 1.Facts2.Foreign key of dimension tables.
- The fact table contains the foreign keys,time dimensions,product dimension,customer dimension,measurement values.Following are some examples of common facts :No of unit sold,Margin,Sales revenue and the dimension tables are customer,time and product e.t.c. which is used to analyse data
There are following 3 types of fact tables :
1.Additive :
Measures that can be added across any dimension
2.Non-additive:
Measures that can not be added across any dimension
3.Semi-additive:
Measures that can be added across some dimensions.
37.what is incremental loading?
- Incremental loading means loading the ongoing changes in the OLTP.
38.Why fact table is in normal form?
- Basically the fact table consists of the Index keys of the dimension/look up tables and the measures.
- so when ever we have the keys in a table .that itself implies that the table is in the normal form.
39.What is data sparsity and how it effect on aggregation?
Data sparsity is term used for how much data we have for a particular dimension/entity of the model.
It affects aggregation depending on how deep the combination of members of the sparse dimension make up. If the combination is a lot and those combination do not have any factual data then creating space to store those aggregations will be a waste as a result, the database will become huge.
40.Give us a non-computer example of preemptive and non-preemptive scheduling?
Consider any system where people use some kind of resources and compete for them. The non-computer examples for preemptive scheduling the traffic on the single lane road if there is emergency or there is an ambulance on the road the other vehicles give path to the vehicles that are in need. The example for preemptive scheduling is people standing in queue for tickets.
41.what is the difference between start schema and Fsldm?
A star schema has one Fact tables and many dimensional tables to store the related. FSLDM (Financial services logical data model) is a model proposed and designed by NCR to cater the needs of financial sectors, specially the Banking domain. It has many Fact tables and dimension tbls. The major fact tbls include Party (Customers of the bank), Product (Services offered by the bank), Event (An event occurred ex: An ATM transaction), Agreement (A deal between the party and the bank for a product)etc… The FSLDM can be customized to cater to bank’s specific needs.
FSLDM -> 3rd normal form Star Schema –> Denormalized for query performance
42.What are the key columns in Fact and dimension tables?
Foreign keys of dimension tables are primary keys of entity tables. Foreign keys of fact tables are the primary keys of the dimension tables.
43. Explain Granularity.
- Granularity in table represents the level of information stored in the table.
- In BI granularity is very important concept to check the table data.
- The granularity is high and low .
- High granularity data contains the data with high information or you can say it as transaction level data is high granularity data.
- Low granularity means data has low level information only.
- Fact table always have low granularity mean we need very low level data in fact table.
- Following 2 points are important in defining granularity :
- 1.Determining the dimensions that are to be included
- 2.Determining location to place hierarchy of each dimension of information
Real life Example :
Date Dimension Granularity level :
Year,month,quarter,period,week,day
In Diagram i shown the snowflake schema where sales table is a fact table and all are dimensions.Store table is further normalized in to different tables name city,state and region.
44.What is level of Granularity of a fact table?
- A fact table is usually designed at a low level of Granularity.
- This means that we need to find the lowest level of information that can store in a fact table.e.g.Employee performance is a very high level of granularity. Employee_performance_daily, employee_performance_weekly can be considered lower levels of granularity.
- The granularity is the lowest level of information stored in the fact table.
- The depth of data level is known as granularity.
- In date dimension, the level could be year, month, quarter, period, week, day of granularity.
- Determine which dimensions will be included.
- Determine where along the hierarchy of each dimension the information will be kept.
45.What are Semi-additive and faceless facts and in which scenario will you use such kinds of fact tables?
- Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive.EX: Average daily balance
- A fact table without numeric fact columns is called factless fact table.Ex: Promotion Facts
- While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.
- Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
- dimensional model fact tables are dependent on the dimension tables.
- This means that fact table contains foreign keys to dimension tables.
- This is the reason dimension tables are loaded first and then the fact table
48.What is a fact-less-fact?
- A fact table that does not contain any measure is called a fact-less fact.
- This table will only contain keys from different dimension tables.
- This is often used to resolve a many-to-many cardinality issue
Explanatory Note:
Consider a school, where a single student may be taught by many teachers and a single teacher may have many students. To model this situation in dimensional model, one might introduce a fact-less-fact table joining teacher and student keys. Such a fact table will then be able to answer queries like,
- Who are the students taught by a specific teacher.
- Which teacher teaches maximum students.
- Which student has highest number of teachers.etc. etc.
Factless fact table is a fact table that contains no fact measure in it. It has only the dimension keys in it.
At times, certain situations may arise in the business where you need to have factless fact table. For example, suppose you are maintaining an employee attendance record system, you can have a factless fact table having three keys.
Employee_ID |
Department_ID |
Time_ID |
You can see that the above table does not contain any measure. Now if you want to answer below question, you can do easily using the above single factless fact table rather than having two separate fact tables:
“How many employees of a particular department were present on a particular day?”
So, factless fact table offers flexibility to the design.
49.What is a coverage fact?
A fact-less-fact table can only answer ‘optimistic’ queries (positive query) but can not answer a negative query.
Again consider the illustration in the above example. A fact-less fact containing the keys of tutors and students can not answer a query like below,
- Which teacher did not teach any student?
- Which student was not taught by any teacher?
Why not? Because fact-less fact table only stores the positive scenarios (like student being taught by a tutor) but if there is a student who is not being taught by a teacher, then that student’s key does not appear in this table, thereby reducing the coverage of the table.
Coverage fact table attempts to answer this – often by adding an extra flag column. Flag = 0 indicates a negative condition and flag = 1 indicates a positive condition. To understand this better, let’s consider a class where there are 100 students and 5 teachers. So coverage fact table will ideally store 100 X 5 = 500 records (all combinations) and if a certain teacher is not teaching a certain student, the corresponding flag for that record will be 0.
50.What are incident and snapshot facts
- A fact table stores some kind of measurements.
- Usually these measurements are stored (or captured) against a specific time and these measurements vary with respect to time.
- Now it might so happen that the business might not able to capture all of its measures always for every point in time.
- Then those unavailable measurements can be kept empty (Null) or can be filled up with the last available measurements.
- The first case is the example of incident fact and the second one is the example of snapshot fact.
51.Incremental Loading for Fact Tables
METHOD OF LOADING
- Generally speaking, incremental loading for Fact tables is relatively easier as,
- unlike dimension tables, here you do not need to perform any look-up on your target table to find out if the source record already exists in the target or not.
- All you need to do is to select incremental records from source (as shown below for the case of “sales” table) and load them as it is to target (you may need to perform lookup to dimensional tables to assign respective surrogate keys ).
Sales Table
ID CustomerID ProductDescription Qty Revenue Sales Date 1 1 White sheet (A4) 100 4.00 22-Mar-2012 2 1 James Clip (Box) 1 2.50 22-Mar-2012 3 2 Whiteboard Marker 1 2.00 22-Mar-2012 4 3 Letter Envelop 200 75.00 23-Mar-2012 5 1 Paper Clip 12 4.00 23-Mar-2012
SELECT t.* FROM Sales t WHERE t.sales_date > (select nvl( max(b.loaded_until), to_date('01-01-1900', 'MM-DD-YYYY') ) from batch b where b.status = 'Success');
where “batch” is a separate table maintained at target system having minimal structure and data like below
Batch_ID Loaded_Until Status 1 22-Mar-2012 Success 2 23-Mar-2012 Success
However, things may get pretty complicated if your fact is a special type of fact called “snapshot fact”. Let’s understand them below.
Loading Incident Fact
Incident fact is the normal fact that we encounter mostly (and that we have seen above in our sales table example). Records in these types of facts are only loaded if there are transactions coming from the source. For example, if at all there is one sale that happens in the source system, then only a new sales record will come. They are dependent on some real “incident” to happen in the source hence the name incident fact.
Loading Snapshot Fact
As opposed to incident fact, snapshot facts are loaded even if there is no real business incident in the source. Let me show you what I mean by using the above example of customer and sales tables in OLTP. Let’s say I want to build a fact that would show me total revenue of sales from each customer for each day. In effect, I want to see the below data in my fact table.
Sales fact table (This is what I want to see in my target fact table)
Date Customer Revenue 22-Mar-2012 John 6.50 22-Mar-2012 Ryan 2.00 23-Mar-2012 John 10.50 23-Mar-2012 Ryan 2.00 23-Mar-2012 Bakers' 75.00
even if no sales was made to Ryan on 23-Mar, we still show him here with the old data. Similarly for John, even if goods totaling to $4.00 was sold to him on 23-Mar, his record shows the cumulative total of $10.50.
Now obviously the next logical question is how to load this fact using incremental loading? Because incremental loading only brings in incremental data – that is on 23rd March, we will only have Bakers’ and John’s records and that too with that day’s sales figures. We won’t have Ryan record in the incremental set.
Why not a full load
You can obviously opt-in for full load mechanism as that would solve this problem but that would take the toll on your loading performance.
The solution?
One way to resolve this issue is: creating 2 incremental channels of loading for the fact. 1 channel will bring in incremental data from source and the other channel will bring in incremental data from the target fact itself. Let’s see how does it happen below. We will take the example for loading 23-Mar data.
Channel 1: Incremental data from source
Customer Revenue John 4.00 Bakers' 75.00
Channel 2: Incremental data from target fact table (last day’s record)
Customer Revenue John 6.50 Ryan 2.00
Next we can perform a FULL OUTER JOIN between the above two sets to come to below result
John 10.50
Ryan 2.00
Bakers' 75.00