Semester : SEMESTER 8
Subject : Data Mining and Ware Housing
Year : 2019
Term : OCTOBER
Branch : COMPUTER SCIENCE AND ENGINEERING
Scheme : 2015 Full Time
Course Code : CS 402
Page:1
A H192009
Reg No.: Name:
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION(S), OCTOBER 2019
Course Code: CS402
Course Name: DATA MINING AND WAREHOUSING
Max. Marks: 100 Duration: 3 Hours
PART A
Answer all questions, each carries 4 marks. Marks
1 How is data warehouse different from a database? How are they similar? (4)
2 Compare star and snowflake schema dimension table. (4)
3 Use the two methods below to normalize the following group of data: (4)
100,200,300,500,900
i) min-max normalization by setting min=0 and നമി
ii) Z-score normalization
4 Explain the attribute selection method in decision trees . (4)
5 Distinguish between hold out method and cross validation method. (4)
6 Explain prepruning and postpruning approaches in decision tree algorithm. (4)
7 Differentiate between support and confidence. (4)
8 How to compute the dissimilarity between objects described by binary variables? (4)
9 Differentiate between Agglomerative and Divisive hierarchical clustering (4)
method.
10 Explain web content mining? (4)
PART 8
Answer any two full questions, each carries 9 marks.
11 The following data is given in increasing order for the attribute age:
13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,36,40,45,46,52,70.
a) Use smoothing by bin boundaries to smooth these data, using bin depth of 3. (3)
b) How might you determine outliers in the data? (3)
c) What other methods are there for data smoothing? (3)
12) Explain the following procedures for attribute subset selection
a) Stepwise forward selection (3)
b) Stepwise backward elimination (3)
c) Acombination of forward selection and backward elimination (3)
Page lof 3