Confuse Skull: Structured Data Classification MCQ's

1. Identify the structured data from the following.

View Answer

Data from mySQL DB and Excel

2. What kind of classification is our case study 'Churn Analysis'?

View Answer

Binary

3. Which command is used to identify the unique values of a column?

View Answer

unique()

4. Which preprocessing technique is used to make the data gaussian with zero mean and unit variance?

View Answer

Standardisation

5. Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to train the classifier and testing set to test the same.

View Answer

True

6. True Negative is when the predicted instance and the actual is positive.

View Answer

False

7. True Positive is when the predicted instance and the actual instance is not negative.

View Answer

True

8. What are the advantages of Naive Bayes?

View Answer

Requires less training data

9. High classification accuracy always indicates a good classifier.

View Answer

True

10. Categorical variables has

View Answer

no logical order

11. Cross-validation technique will provide accurate results when the training set and the testing set are from two different populations.

View Answer

True

12. Choose the correct sequence for classifier building from the following:

View Answer

Initialize -> Train - -> Predict-->Evaluate

13. Which of the given hyper parameter(s), when increased may cause random forest to over fit the data?

View Answer

Depth of Tree

14. To view the first 3 rows of the dataset, which of the following commands are used?Download the dataset from:https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

View Answer

iris.head(3)

15. Pruning is a technique associated with

View Answer

Decision tree

16. The commonly used package for machine learning in python is

View Answer

sklearn

17. A classifer that can compute using numeric as well as categorical values is

View Answer

Decision Tree Classifier

18. Can we consider sentiment classification as a text classification problem?

View Answer

yes

19. Let's assume, you are solving a classification problem with highly imbalanced class. The majority class is observed 99% of times in the training data. Which of the following is true when your model has 99% accuracy after taking the predictions on test data. ?

View Answer

For imbalanced class problems, accuracy metric is not a good idea.

20. email spam detection is an example of

View Answer

supervised classification

21. A technique used to depict the performance in a tabular form that has 2 dimensions namely “actual” and “predicted” sets of data.

View Answer

Confusion Matrix

22. What kind of classification is the given case study(IRIS dataset)?Download the dataset from: https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

View Answer

Multi class classification

23. Ordinal variables has

View Answer

clear logical order

24. Which command is used to select all NUMERIC types in the dataset.Download the dataset from: https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

View Answer

iris_num = iris_data.select_dtypes(include=[numpy.number])

25. The number of categorical attributes in the original dataset.Download the dataset from: https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

View Answer

3 26. Which classifier converges easily with less training data?

View Answer

Naive Bayes Classifier

27. Ensemble learning is used when you build component classifiers that are more accurate and independent from each other.

View Answer

true

28. clustering is an example of

View Answer

unsupervised classification

29. Model Tuning helps to increase the accuracy

View Answer

True

30. Imputing is a strategy to handle

View Answer

Missing Values

31. classification where each data is mapped to more than one class is called

View Answer

Binary Classification.

32. The fit(X, y) is used to

View Answer

Train the Classifier

33. Supervised learning differs from unsupervised learning as supervised learning requires __________

View Answer

Labeled data

34. Clustering is a supervised classification.

View Answer

False

35. Select the correct option which directly achieve multi-class classification (without support of binary classifiers).

View Answer

K Nearest Neighbor

36. The classification where each data is mapped to more than one class is called ___________

View Answer

Multi Label Classification

37. Email spam data is an example of __________

View Answer

unstructed Data

38. The most widely used package for machine learning in Python is _________

View Answer

sklearn

39. Pruning is a technique associated with __________

View Answer

dt

40. What does the command sentiment_analysis_data['label'].value_counts() return?

View Answer

counts of unique values in the 'label' column

41. Select the pre-processing technique(s) from the following.

View Answer

all

42. Which of the given hyper parameter, when increased, may cause random forest to over fit the data?

View Answer

depth of tree

43. Select the correct statement about Nonlinear classification.

View Answer

Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyperplanes.

44. Choose the correct sequence for classifier building from the following.

View Answer

Initialize -> Train - -> Predict-->Evaluate

45. What command should be given to tokenize a sentence into words?

View Answer

from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)

46. Choose the correct sequence from the following.

View Answer

Data Analysis -> PreProcessing -> Model Building--> Predict

47. The following are all classification techniques, except ___________

View Answer

StratifiedShuffleSplit

48. The commonly used package for machine learning in python is

View Answer

sklearn

49. How many new columns does the following command return?

View Answer

iris_series = pd.get_dummies(iris['Species'])

50. Download the dataset from: https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

View Answer

3 51. Naive Bayes Algorithm is useful for :

View Answer

indepth analysis

52. A process used to identify data points that are simply unusual

View Answer

Anomaly Detection

53. Is there a class imbalance problem in the given data set?

View Answer

no

54. Which of the following is not a technique to process missing values?

View Answer

One hot encoding

55. Images,documents are examples of

View Answer

Unstructured Data

56. email spam detection is an example of

View Answer

The count with unique values in the iris['species'] column

57. Choose the correct sequence for classifier building from the following:

View Answer

Initialize -> Train -> Predict -> Evaluate

58. Identify the command used to view the dataset SIZE and what is the value returned?Download the dataset from: https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

View Answer

iris.shape,(150,6)

59. Which type of cross validation is used for imbalanced dataset?

View Answer

K fold

60. To view the first 3 rows of the dataset, which of the following commands are used?Download the dataset from: https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

View Answer

iris.head(3)

61. Imagine you have just finished training a decision tree for spam classication and it is showing abnormal bad performance on both your training and test sets. Assume that your implementation has no bugs. What could be reason for this problem.

View Answer

Pages

Search Your Question...!

Structured Data Classification MCQ's

1. Identify the structured data from the following.

Data from mySQL DB and Excel

2. What kind of classification is our case study 'Churn Analysis'?

Binary

3. Which command is used to identify the unique values of a column?

unique()

4. Which preprocessing technique is used to make the data gaussian with zero mean and unit variance?

Standardisation

5. Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to train the classifier and testing set to test the same.

True

6. True Negative is when the predicted instance and the actual is positive.

False

7. True Positive is when the predicted instance and the actual instance is not negative.

True

8. What are the advantages of Naive Bayes?

Requires less training data

9. High classification accuracy always indicates a good classifier.

True

10. Categorical variables has

no logical order

11. Cross-validation technique will provide accurate results when the training set and the testing set are from two different populations.

True

12. Choose the correct sequence for classifier building from the following:

Initialize -> Train - -> Predict-->Evaluate

13. Which of the given hyper parameter(s), when increased may cause random forest to over fit the data?

Depth of Tree

14. To view the first 3 rows of the dataset, which of the following commands are used?Download the dataset from:https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

iris.head(3)

15. Pruning is a technique associated with

Decision tree

16. The commonly used package for machine learning in python is

sklearn

17. A classifer that can compute using numeric as well as categorical values is

Decision Tree Classifier

18. Can we consider sentiment classification as a text classification problem?

yes

19. Let's assume, you are solving a classification problem with highly imbalanced class. The majority class is observed 99% of times in the training data. Which of the following is true when your model has 99% accuracy after taking the predictions on test data. ?

For imbalanced class problems, accuracy metric is not a good idea.

20. email spam detection is an example of

supervised classification

21. A technique used to depict the performance in a tabular form that has 2 dimensions namely “actual” and “predicted” sets of data.

Confusion Matrix

22. What kind of classification is the given case study(IRIS dataset)?Download the dataset from: https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

Multi class classification

23. Ordinal variables has

clear logical order

24. Which command is used to select all NUMERIC types in the dataset.Download the dataset from: https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

iris_num = iris_data.select_dtypes(include=[numpy.number])

25. The number of categorical attributes in the original dataset.Download the dataset from: https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv to answer the question.

3

26. Which classifier converges easily with less training data?

Naive Bayes Classifier

27. Ensemble learning is used when you build component classifiers that are more accurate and independent from each other.

true

28. clustering is an example of

unsupervised classification

29. Model Tuning helps to increase the accuracy

True

30. Imputing is a strategy to handle

Missing Values

31. classification where each data is mapped to more than one class is called

Binary Classification.

32. The fit(X, y) is used to

Train the Classifier

33. Supervised learning differs from unsupervised learning as supervised learning requires __________

Labeled data

34. Clustering is a supervised classification.

False

35. Select the correct option which directly achieve multi-class classification (without support of binary classifiers).

K Nearest Neighbor

36. The classification where each data is mapped to more than one class is called ___________

Multi Label Classification

37. Email spam data is an example of __________

unstructed Data

38. The most widely used package for machine learning in Python is _________

sklearn

39. Pruning is a technique associated with __________