Search Your Question...!

Machine Learning - Exploring the Model MCQ's

Machine Learning - Exploring the Model

Welcome to the course on Machine Learning - Exploring the Model

The objective of this course is to familiarize you with the steps involved in 

fitting a machine learning model to a data set

You will learn all the concepts involved in building a Machine Learning Model, 

from Hypothesis function that represents a model for a given data-set to evaluation of the hypothesis for a general case

ML Model Representation

Suppose you are provided with a data-set that has area of the house in square feet and the respective price

How do you think you will come up with a Machine Learning Model to learn the data and use this model to predict the price of a random house given the area?

You will learn that in the following cards.

House Price Prediction

We have a data-set consisting of houses with their area in sq feet and their respective prices

Assume that the prices are dependent on the area of the house

Let us learn how to represent this idea in Machine Learning parlance

ML Notations

The input / independent variables are denoted by 'x'

The output / dependent variable(s) are denoted by 'y'

In our problem the area in sq foot values are'x'and house prices are 'y'.

Here change in one variable is dependent on change in another variable. This technique is called Regression.

Model Representation

The objective is, given a set of training data, the algorithm needs to come up with a way to map 'x' to 'y'

This is denoted by h: X ? Y

h(x) is called the hypothesis that does the mapping.

Model Representation Explained

This video outlines the model representation process in Machine Learning.

Why Cost Function ?

You have learnt how to map the input and output variables through the hypothesis function in the previous example.

After defining the hypothesis function, the accuracy of the function has to be determined 

to gauge the predictive power . i.e., how are the square feet values predicting the housing prices accurately.

Model Selection

Model Selection is a part of the hypothesis evaluation process where the model is evaluated on test set to check how well the model generalizes on new data sets.

Train

One way to break down our dataset into the three sets is:

Training set: 60%

Cross validation set: 20%

Test set: 20%

Quick Fact

Use the training set for finding the optimal parameters for the cost function

Use the validation set to get the polynomial with the least error

Use the test set for estimating the generalization error

Fitting Visualized

Loading image..

You can see three different mapping functions for the same data .


Example 1 - Under-fitting with high bias

Example 2 - Proper fit

Example 3 - Over-Fitting with High Variance

Tips on Reducing Overfitting

Reduce the number of features:

 - **Manually select** which features to keep.

 - A **model selection** algorithm can be used 

Regularization

Suggestion is to reduce the magnitude of the parameters

Regularization works well when there are lot of moderately useful features


Bias Vs Variance

How are the predictions far off from the actual values is measured by Bias

To what extent are the the predictions for a given point change between various realizations of the model is measured by variance

Both these values are essential for analysis while selecting an optimum Machine Learning model


Bias vs Variance continued

If there are bad predictions, need to distinguish if it is due to bias or variance

High bias leads to under-fitting of data and high variance leads to over-fitting

The need is to find an optimal value for these two parameters

Learning Curves Intro

Training an algorithm on a small number of data points will have almost zero errors because we can find a quadratic function that maps exactly those points correctly

As the training set gets large and more complex , the error for a quadratic function increases

The error value will reach a constant only after a certain number of training sets


High Bias

Low training set size: causes Training set error to be low and cross validation set error to be high

Large training set size: causes both training set error and cross validation set error to be high with validation set error much greater that training set error.

So when a learning algorithm is has high bias, getting more training data will not aid much in improving


High Variance

Low training set size: Training set error will be low and Cross Validation set error will be high.

Large training set size: Training set error increases with training set size and Cross Validation set error continues to decrease without leveling off. Also,Training Set Error less than cross validation set error but the difference between them remains significant.

If a learning algorithm has high variance, getting more training data will help in improvement

More tips

Getting more training data : Solution for high variance problem

Trying smaller number of input features: Solution for high variance

Adding new input features: High Bias problem can be fixed

Adding new polynomial features: High Bias Problem can be fixed


Model Complexity Effects

  • Lower-order polynomials have very high bias and very low variance. This is a poor fit
  • Higher-order polynomials have low bias on the training data, but very high variance. This is over fit.
  • The objective is to build a model that can generalize well and that fits the data well.




1. Cost function in linear regression is also called squared error function.

   View Answer   

   True



2. How are the parameters updates during Gradient Descent Process ?

   View Answer   

   Simultaneously/seq



3. For different parameters of the hypothesis function we get the same hypothesis function.

   View Answer   

   False



4. Problems that predict real values outputs are called ?

   View Answer   

   Regression



5. The objective function for linear regression is also known as Cost Function.

   View Answer   

   True



6. What is the process of subtracting the mean of each variable from its variable called ?

   View Answer   

   MN



7. Output variables are known as Feature Variables .

   View Answer   

   True



8. The result of scaling is a variable in the range of [1 , 10].

   View Answer   

   False



9. What is the process of dividing each feature by its range called ? 

   View Answer   

   FS



10. What is the Learning Technique in which the right answer is given for each example in the data called ?

   View Answer   

   SL



11. function is used as a mapping function for classification problem.

   View Answer   

   Convex



12. Overfit data has a high bias.

   View Answer   

   True



13. Reducing the number of features can reduce overfitting.

   View Answer   

   True



14. I have a scenario where my hypothesis fits my training set well but fails to generalize for test set. What is this scenario called ?

   View Answer   

   OF



15. is the line that separates y = 0 and y = 1 in a logistic function.

   View Answer   

   DB



16. the error is determined by getting the proportion of values miss-classified by the model. 

   View Answer   

   Classification



17. For an overfit data set the cross validation error will be much bigger than the training error.

   View Answer   

   True



18. the error is calculated by finding the sum of squared distance between actual and predicted values.

   View Answer   

   Regression



19. Where does the sigmoid function asymptote ?

   View Answer   

   0/1



20. when a ML Model has high bias, getting more training data will help in improving the model.

   View Answer   

   False



21. Problems where discrete valued outputs predicted are called ?

   View Answer   

   CP



22. What measures the extent to which the predictions change between various realizations of the model ?

   View Answer   

   Variance



23. What is the name of the function that takes the input and maps it to the output variable called ?

   View Answer   

   HF



No comments:

Post a Comment