Unstructured Data Classification MCQ's

1.  Identify the unstructured data from the following.

2.  What kind of classification is our case study 'Spam Detection'?

3.  Which preprocessing technique is used to remove the most commonly used words?

4.  Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to train the classifier and testing set to test the same.

5.  True Negative is when the predicted instance and the actual instance is positive.

6. True Positive is when the predicted instance and the actual instance is not negative.

7. TF and IDF use matrix representations.

8. Which of the following command is used to view the dataset SIZE and what is the value returned?

   sentiment_analysis_data.shape(),(7086, 2)

9. What command should be given to tokenize a sentence into words?

   from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)

10. TF-IDF is a feature extraction technique.

11. What is the purpose of lemmatization?

   To convert words into a proper base form

12. Which of the following is not a preprocessing method used for unstructured data classification?

13. Stemming and lemmatization gives the same result.

14. In a Document Term Matrix (DTM) each row represents _______

15. The fit (X, y) is used to __________

   Train the classifier

16. Can we consider sentiment classification as a text classification problem?

17. CHigh classification accuracy always indicates a good classifier.

18. A classifier that can compute using numeric as well as categorical values is __________

19. The following are performance evaluation measures, except __________

20. Which NLP technique uses lexical knowledge base to obtain the correct base form of the words?

21. An algorithm that counts how many times a word appears in a document is __________

22. What is the output of the sentence “Good words bring good feelings to the heart” after performing tokenization, lemmatization and stop word removal?

   'Good word bring good feeling heart'

23. Supervised learning differs from unsupervised learning as supervised learning requires __________

   Labeled data

24. SClustering is a supervised classification.

25. Select the correct option which directly achieve multi-class classification (without support of binary classifiers).

   K Nearest Neighbor

26. The classification where each data is mapped to more than one class is called ___________.

   Multi Label Classification

27. Email spam data is an example of __________.

   unstructured Data

28. The most widely used package for machine learning in Python is _________.

29. Pruning is a technique associated with __________.

30. What does the command sentiment_analysis_data['label'].value_counts() return?

   counts of unique values in the 'label' column

31. Select the pre-processing technique(s) from the following.

32. Which of the given hyper parameter, when increased, may cause random forest to over fit the data?

   depth of tree

33. Select the correct statement about Nonlinear classification.

   Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyperplanes.

34.  Choose the correct sequence for classifier building from the following.

   Initialize -- Train - Predict--Evaluate.

35.  What command should be given to tokenize a sentence into words?

   from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence).

36.  Choose the correct sequence from the following.

   Data Analysis -> PreProcessing -> Model Building--> Predict.

37. The following are all classification techniques, except ___________

