Confuse Skull: Unstructured Data Classification MCQ's

1. Identify the unstructured data from the following.

View Answer

IMAGE

2. What kind of classification is our case study 'Spam Detection'?

View Answer

BINARY

3. Which preprocessing technique is used to remove the most commonly used words?

View Answer

STOPWORDS

4. Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to train the classifier and testing set to test the same.

View Answer

TRUE

5. True Negative is when the predicted instance and the actual instance is positive.

View Answer

FALSE

6. True Positive is when the predicted instance and the actual instance is not negative.

View Answer

TRUE

7. TF and IDF use matrix representations.

View Answer

TRUE

8. Which of the following command is used to view the dataset SIZE and what is the value returned?

View Answer

sentiment_analysis_data.shape(),(7086, 2)

9. What command should be given to tokenize a sentence into words?

View Answer

from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)

10. TF-IDF is a feature extraction technique.

View Answer

True

11. What is the purpose of lemmatization?

View Answer

To convert words into a proper base form

12. Which of the following is not a preprocessing method used for unstructured data classification?

View Answer

confusion_matrix

13. Stemming and lemmatization gives the same result.

View Answer

False

14. In a Document Term Matrix (DTM) each row represents _______

View Answer

TF VALUE

15. The fit (X, y) is used to __________

View Answer

Train the classifier

16. Can we consider sentiment classification as a text classification problem?

View Answer

YES

17. CHigh classification accuracy always indicates a good classifier.

View Answer

TRUE

18. A classifier that can compute using numeric as well as categorical values is __________

View Answer

NB

19. The following are performance evaluation measures, except __________

View Answer

DecisionTree

20. Which NLP technique uses lexical knowledge base to obtain the correct base form of the words?

View Answer

lemmatization

21. An algorithm that counts how many times a word appears in a document is __________

View Answer

TF-IDF

22. What is the output of the sentence “Good words bring good feelings to the heart” after performing tokenization, lemmatization and stop word removal?

View Answer

'Good word bring good feeling heart'

23. Supervised learning differs from unsupervised learning as supervised learning requires __________

View Answer

Labeled data

24. SClustering is a supervised classification.

View Answer

False

25. Select the correct option which directly achieve multi-class classification (without support of binary classifiers).

View Answer

K Nearest Neighbor

26. The classification where each data is mapped to more than one class is called ___________.

View Answer

Multi Label Classification

27. Email spam data is an example of __________.

View Answer

unstructured Data

28. The most widely used package for machine learning in Python is _________.

View Answer

sklearn

29. Pruning is a technique associated with __________.

View Answer

dt

30. What does the command sentiment_analysis_data['label'].value_counts() return?

View Answer

counts of unique values in the 'label' column

31. Select the pre-processing technique(s) from the following.

View Answer

all

32. Which of the given hyper parameter, when increased, may cause random forest to over fit the data?

View Answer

depth of tree

33. Select the correct statement about Nonlinear classification.

View Answer

Pages

Search Your Question...!

Unstructured Data Classification MCQ's

1. Identify the unstructured data from the following.

IMAGE

2. What kind of classification is our case study 'Spam Detection'?

BINARY

3. Which preprocessing technique is used to remove the most commonly used words?

STOPWORDS

4. Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to train the classifier and testing set to test the same.

TRUE

5. True Negative is when the predicted instance and the actual instance is positive.

FALSE

6. True Positive is when the predicted instance and the actual instance is not negative.

TRUE

7. TF and IDF use matrix representations.

TRUE

8. Which of the following command is used to view the dataset SIZE and what is the value returned?

sentiment_analysis_data.shape(),(7086, 2)

9. What command should be given to tokenize a sentence into words?

from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)

10. TF-IDF is a feature extraction technique.

True

11. What is the purpose of lemmatization?

To convert words into a proper base form

12. Which of the following is not a preprocessing method used for unstructured data classification?

confusion_matrix

13. Stemming and lemmatization gives the same result.

False

14. In a Document Term Matrix (DTM) each row represents _______

TF VALUE

15. The fit (X, y) is used to __________

Train the classifier

16. Can we consider sentiment classification as a text classification problem?

YES

17. CHigh classification accuracy always indicates a good classifier.

TRUE

18. A classifier that can compute using numeric as well as categorical values is __________

NB

19. The following are performance evaluation measures, except __________

DecisionTree

20. Which NLP technique uses lexical knowledge base to obtain the correct base form of the words?

lemmatization

21. An algorithm that counts how many times a word appears in a document is __________

TF-IDF

22. What is the output of the sentence “Good words bring good feelings to the heart” after performing tokenization, lemmatization and stop word removal?

'Good word bring good feeling heart'

23. Supervised learning differs from unsupervised learning as supervised learning requires __________

Labeled data

24. SClustering is a supervised classification.

False

25. Select the correct option which directly achieve multi-class classification (without support of binary classifiers).

K Nearest Neighbor

26. The classification where each data is mapped to more than one class is called ___________.

Multi Label Classification

27. Email spam data is an example of __________.

unstructured Data

28. The most widely used package for machine learning in Python is _________.

sklearn

29. Pruning is a technique associated with __________.

dt

30. What does the command sentiment_analysis_data['label'].value_counts() return?

counts of unique values in the 'label' column

31. Select the pre-processing technique(s) from the following.

all

32. Which of the given hyper parameter, when increased, may cause random forest to over fit the data?

depth of tree

33. Select the correct statement about Nonlinear classification.

Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyperplanes.

34. Choose the correct sequence for classifier building from the following.

Initialize -- Train - Predict--Evaluate.

35. What command should be given to tokenize a sentence into words?

from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence).

36. Choose the correct sequence from the following.

Data Analysis -> PreProcessing -> Model Building--> Predict.

37. The following are all classification techniques, except ___________

StratifiedShuffleSplit.

No comments:

Post a Comment