B.E. VIII Semester Examination, June 2020
Grading System (GS)
Data Mining and Warehousing (Elective – IV)
i) Attempt any five questions.
ii) All questions carry equal marks.
Table of Contents
1. Briefly compare the following concepts. You may use an example to explain your points. Snow flake, fact constellation, starnet query model.
Source: Click me
Snowflake was launched in July-2012. This is a cloud computing data warehouse company whose main office is located in the US.
Many big technology companies use this software or solution, with the help of which many of their databases gets connected to one central warehouse and then a user can easily extract information by querying data from many databases which otherwise is a very painful task i.e. without some central warehouse.
Advantages of snowflake:
Snowflake is very fast and helps to store both structured and unstructured data, allows fast and seamless querying of vast amount of data, connected databases synced into snowflakes in near real time. Apart from this the data is also secure from cyber threats and attacks which is another benefit.
Source: Click me
What is fact constellation?
Fact constellation helps in joining several is star like fact tables or schemas using some common dimension tables between them that is why it is called as fact constellation.
The main advantage of fact constellation is that it enables data querying from complex data systems.
For example suppose you have multiple DBs used for different purposes like
- DB-I for storing static information
- DB-II for storing user activity information
- DB-III for storing schema less data
In the above case we can have some common unique identifier in all three different DBs or schemas that will help in fetching Useful information out of these systems.
Disadvantage of fact constellation:
The only disadvantage of this method is that it is quite complex and hence is not desired much. but in technology companies which deals with a lot of type of database this is quite common and hence it is worth mentioning here.
Starnet Query Model:
Source: Click me
In a starnet query model, radial lines emanate from a central location or point, where each radial line represents one dimension the hierarchy of which is represented along the line.
2. Discuss the architecture of Data warehouse.
Refer the following links for in-depth content for data warehouse architecture:
Links1: Click me
- Use the above link for description of following:
- Data Warehouse Architecture: Basic
- Data Warehouse Architecture: With Staging Area
- Data Warehouse Architecture: With Staging Area and Data Marts
Link2: Click me
- Elaborates two approaches for data warehouse construction:
- Top down approach
- Bottom-up approach
Link3: Click me
- This third link elaborates several other data warehouse architectures viz.
- Single-tier architecture
- Two-tier architecture
- Three-tier architecture etc.
3. Why is data cleansing and data transformation function considered to be a vital task in the integration process.
Refer the following link for detailed explanation on requirement and benefits of data cleansing and data transformation. It tells everything from less storage capacity requirement, better insights generation, easy reading and visualisation etc.
4. Define association rule mining and explain how apriori algorithm works with suitable example?
Association Rule Mining:
In data mining or data processing there are terminologies i.e. roll-up and drill down.
In roll-up we aggregate the data by decreasing the details i.e. we move up the hierarchy for e.g.
|Country||City||Population (in thousands)|
Now in above example if we are looking data at city level then the data will look as visible above, but if want to see the data at country level we will roll-up the data in hierarchy from city to country and this is what is called as roll-up in data mining or data analysis or simple analytics.
While, the reverse of this is drill-down in which we go from country level to city level i.e. we are moving down the hierarchy or we are increasing the level of details available in our analysis.
Above example is very simple example with only two level of detail, in the real world problems the data set can have many level hierarchies.
ii) Drill-down operation of ROLAP and MOLAP. Which implementation Technique do you prefer.
ROLAP: Relational Online Analytical Processing
MOLAP: Multidimensional Online Analytical Processing
So, in terms of which technique to prefer or which implementation technique is superior so there is no straight forward answer to that, because each one of them has their separate benefits and cons for e.g.
ROLAP vs MOLAP:
- Condition 1: If the data set is large
- In this case ROLAP is better technique than w.r.t MOLAP, because it is made for handling such large data sets at a faster speed.
- Condition 2: Speed
- In terms of speed also ROLAP is better.
- Condition 3: Complex insights generation from multiple databases
- As ROLAP fetches data from data warehouse and is designed to deal with complicated sql queries, so again the winner will be ROLAP.
But, apart from handling large data sets, speed and complex querying there are other areas as well in which MOLAP performs better than ROLAP.
6. Describe FP growth algorithm with example. OR Write short notes on data mining application.
Data Mining Application:
7. Discuss in detail the application of Data mining for Financial Data Analysis? Give suitable data flow diagram.
While data mining is a very new technique which has evolved many folds with the
8. a) Explain the major classification of Clustering methods.
b) Explain outlier analysis with example.
What is outlier analysis?
Outlier analysis is the finding of some abnormal observations in existing data set i.e. having some values which are absurdly very high or very low with respect to the general observations.
Advantages of performing outlier analysis?
The benefits of doing outlier analysis is that we remove the absurdly high or low value from observation, hence we get correct results for what ever analysis we are doing, because if we are using insights based on the data which is not correct for forming some strategy for business or in some experiments then that can lead to loss or wrong results.
Methods to find outliers:
- Visually or plotting: Always try to plot values on x-y axis that helps in identifying very high or low values easily using normal visualisation.
- Averages: Try to find average, mean, median, mode etc. and the ones which are very far away from these values can be excluded state forward.
- Historical patterns: Look at the historical values or historical patterns of the values and remove the once lying beyond upper and lower limits of it.
- Data sorting: Sorting the data in small to large or in vice-versa order is another useful method.
Question paper source: https://www.rgpvonline.com
Other relevant articles:
- Soft Computing Solved Question Paper
- Soft Computing MCQ
- RGPV (IT-840) – Data Mining and Warehousing – Solved Question Paper – To Score Better
- RGPV (IT-702) – Wireless and Mobile Computing – Solved Question Paper – To Score Better
- RGPV (IT-601) – Computer Graphics and Multimedia – Solved Question Paper – To Score Better
- RGPV-CS-801 – Soft Computing – Solved Question Paper – To Score Better
- RGPV-CS-8002 – Cloud Computing – Solved Question Paper – To Score Better
- RGPV (IT-8001) – Information Security – Solved Question Paper – To Score Better
- Data Mining and Warehousing Solved Question Paper, Notes and Question Bank