Data Mining and Warehousing Solved Question Paper – To Score Better – RGPV (IT-840)

IT-840 (GS)

B.E. VIII Semester Examination, June 2020
Grading System (GS)
Data Mining and Warehousing (Elective – IV)

Note:

i) Attempt any five questions.
ii) All questions carry equal marks.

RGPV (IT-840) – Data Mining and Warehousing – Solved Question Paper

1. Briefly compare the following concepts. You may use an example to explain your points. Snow flake, fact constellation, starnet query model.

Snowflake:

Source: Click me

Snowflake overview:

Snowflake was launched in July-2012. This is a cloud computing data warehouse company whose main office is located in the US.

Many big technology companies use this software or solution, with the help of which many of their databases gets connected to one central warehouse and then a user can easily extract information by querying data from many databases which otherwise is a very painful task i.e. without some central warehouse.

Advantages of snowflake:

Snowflake is very fast and helps to store both structured and unstructured data, allows fast and seamless querying of vast amount of data, connected databases synced into snowflakes in near real time. Apart from this the data is also secure from cyber threats and attacks which is another benefit.

Fact Constellation:

Source: Click me

What is fact constellation?

Fact constellation helps in joining several is star like fact tables or schemas using some common dimension tables between them that is why it is called as fact constellation.

The main advantage of fact constellation is that it enables data querying from complex data systems.

For example suppose you have multiple DBs used for different purposes like

  • DB-I for storing static information
  • DB-II for storing user activity information
  • DB-III for storing schema less data

In the above case we can have some common unique identifier in all three different DBs or schemas that will help in fetching Useful information out of these systems.

Disadvantage of fact constellation:

The only disadvantage of this method is that it is quite complex and hence is not desired much. but in technology companies which deals with a lot of type of database this is quite common and hence it is worth mentioning here.

Starnet Query Model:

Source: Click me
In a starnet query model, radial lines emanate from a central location or point, where each radial line represents one dimension the hierarchy of which is represented along the line.

2. Discuss the architecture of Data warehouse.

Refer the following links for in-depth content for data warehouse architecture:

Links1: Click me

  • Use the above link for description of following:
    • Data Warehouse Architecture: Basic
    • Data Warehouse Architecture: With Staging Area
    • Data Warehouse Architecture: With Staging Area and Data Marts

Link2: Click me

  • Elaborates two approaches for data warehouse construction:
    • Top down approach
    • Bottom-up approach

Link3: Click me

  • This third link elaborates several other data warehouse architectures viz.
    • Single-tier architecture
    • Two-tier architecture
    • Three-tier architecture etc.

3. Why is data cleansing and data transformation function considered to be a vital task in the integration process.

Refer the following link for detailed explanation on requirement and benefits of data cleansing and data transformation. It tells everything from less storage capacity requirement, better insights generation, easy reading and visualisation etc.

Click me

4. Define association rule mining and explain how apriori algorithm works with suitable example?

Association Rule Mining:

Apriori Algorithm:

5. Discuss
i) Roll-up

In data mining or data processing there are terminologies i.e. roll-up and drill down.

In roll-up we aggregate the data by decreasing the details i.e. we move up the hierarchy for e.g.

CountryCityPopulation (in thousands)
USNew York100
USFlorida120
USLas Vegas80

Roll-up example:

Now in above example if we are looking data at city level then the data will look as visible above, but if want to see the data at country level we will roll-up the data in hierarchy from city to country and this is what is called as roll-up in data mining or data analysis or simple analytics.

Drill down:

While, the reverse of this is drill-down in which we go from country level to city level i.e. we are moving down the hierarchy or we are increasing the level of details available in our analysis.

Above example is very simple example with only two level of detail, in the real world problems the data set can have many level hierarchies.


ii) Drill-down operation of ROLAP and MOLAP. Which implementation Technique do you prefer.

Brief:

Refer the following link for seeing what is the exact difference between ROLAP and MOLAP, where

ROLAP: Relational Online Analytical Processing

MOLAP: Multidimensional Online Analytical Processing

So, in terms of which technique to prefer or which implementation technique is superior so there is no straight forward answer to that, because each one of them has their separate benefits and cons for e.g.

ROLAP vs MOLAP:

  • Condition 1: If the data set is large
    • In this case ROLAP is better technique than w.r.t MOLAP, because it is made for handling such large data sets at a faster speed.
  • Condition 2: Speed
    • In terms of speed also ROLAP is better.
  • Condition 3: Complex insights generation from multiple databases
    • As ROLAP fetches data from data warehouse and is designed to deal with complicated sql queries, so again the winner will be ROLAP.

But, apart from handling large data sets, speed and complex querying there are other areas as well in which MOLAP performs better than ROLAP.

6. Describe FP growth algorithm with example. OR Write short notes on data mining application.

Data Mining Application:

Refer the following link, product analytics, business analytics, sales analytics and data mining application in various fields is very explained here.

7. Discuss in detail the application of Data mining for Financial Data Analysis? Give suitable data flow diagram.

While data mining is a very new technique which has evolved many folds with the

8. a) Explain the major classification of Clustering methods.


b) Explain outlier analysis with example.

Outlier Analysis:

What is outlier analysis?

Outlier analysis is the finding of some abnormal observations in existing data set i.e. having some values which are absurdly very high or very low with respect to the general observations.

Advantages of performing outlier analysis?

The benefits of doing outlier analysis is that we remove the absurdly high or low value from observation, hence we get correct results for what ever analysis we are doing, because if we are using insights based on the data which is not correct for forming some strategy for business or in some experiments then that can lead to loss or wrong results.

Methods to find outliers:

  • Visually or plotting: Always try to plot values on x-y axis that helps in identifying very high or low values easily using normal visualisation.
  • Averages: Try to find average, mean, median, mode etc. and the ones which are very far away from these values can be excluded state forward.
  • Historical patterns: Look at the historical values or historical patterns of the values and remove the once lying beyond upper and lower limits of it.
  • Data sorting: Sorting the data in small to large or in vice-versa order is another useful method.

Question paper source: https://www.rgpvonline.com

Other relevant articles: